Predicting lattice constant of complex cubic perovskites using computational intelligence

Predicting lattice constant of complex cubic perovskites using computational intelligence

Computational Materials Science 50 (2011) 1879–1888 Contents lists available at ScienceDirect Computational Materials Science journal homepage: www...

656KB Sizes 24 Downloads 237 Views

Computational Materials Science 50 (2011) 1879–1888

Contents lists available at ScienceDirect

Computational Materials Science journal homepage: www.elsevier.com/locate/commatsci

Predicting lattice constant of complex cubic perovskites using computational intelligence Abdul Majid a,b, Asifullah Khan a,b,⇑, Tae-Sun Choi b a b

Department of Information and Computer Sciences, Pakistan Institute of Engineering and Applied Sciences, Nilore, Islamabad, Pakistan Department of Mechatronics, Gwangju Institue of Science and Technology, 261 Cheomdan-Gwagiro, Buk-gu, Gwangju 500-712, South Korea

a r t i c l e

i n f o

Article history: Received 5 November 2010 Received in revised form 21 December 2010 Accepted 21 January 2011 Available online 17 February 2011 Keywords: Support vector regression Random forest Generalized regression neural network Multiple linear regression Lattice constant Complex cubic perovskites

a b s t r a c t Recently in the field of materials science, advanced computational intelligence (CI) based approaches are gaining substantial importance for modeling the quantitative structure to properties relationship. In this study, we have used support vector regression, random forest, generalized regression neural network, and multiple linear regression based CI approaches to predict lattice constants (LCs) of complex cubic perovskites. We have collected reasonable number of perovskites compounds from the recent literature of materials science. The CI models are developed using 100 training compounds and the generalized performance is estimated for the novel 97 compounds. Our analysis highlights the improved prediction performance of CI approaches than the well-known SPuDS software, which is extensively used in crytsallography. We further observed that, for some of the compounds, the larger prediction error provided by the CI models is correlated with the structure deviation of the compounds from its ideal cubic symmetry. Ó 2011 Elsevier B.V. All rights reserved.

1. Introduction In case of crystalline materials, lattice constants (LCs) not only play an important role in the identification of compounds but also help in assessing interesting material properties. In various industrial applications, it is important to have correct knowledge of lattice constants of unknown perovskites for synthesizing and analyzing structural properties. This assists in making a choice of suitable material in many industries related problems. For example, lattice mismatch between thin films and substrates is a familiar industrial concern. This restricts the fabrication of thin films at large scale [1]. In the epitaxial growth of thin film, in semiconductor industry, appropriate perovskites compounds are searched whose lattice constant (LC) is matched with the substrate. The reduced lattice mismatch improves the film quality [2]. Sr2CrWO6 perovskites on SrTiO3 substrate, the lattice mismatch smaller than 0.13% is obtained. However, in order to avoid atomic disorder for a very narrow range of deposition temperature and atmosphere, metallic perovskites of Sr2CrReO6 and Sr2FeMoO6 are preferred for SrTiO3 (0 0 1) substrates. The hetero-epitaxial growth with matching lattice structure of Ca2FeMoO6/Ca2FeReO6 compounds ⇑ Corresponding author at: Department of Information and Computer Sciences, Pakistan Institute of Engineering and Applied Sciences, Nilore, Islamabad, Pakistan. Tel.: +92 51 2207381 3; fax: +92 51 2208070. E-mail addresses: [email protected] (A. Majid), [email protected] (A. Khan), [email protected] (T.-S. Choi). 0927-0256/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.commatsci.2011.01.035

are feasible [3]. In these scenarios, the knowledge of precise prediction of LCs, especially of newly synthesized perovskites, is of prime importance. Perovskites materials have many interesting physical, chemical, and catalytic properties. Due to the diverse physiochemical and transport properties, the complex cubic compounds are being used in numerous scientific and engineering applications to design dielectric resonators for wireless communication, piezoelectric, ferroelectric and optical devices [4], etc. For example, Sr2FeMoO6 and Sr2FeReO6 perovskites are used for half-metallic ferrimagnets, relaxer ferroelectric, low loss dielectric, and photocatalyst [5]. A useful application of perovskites compounds is the ferroelectric property that helps to retain a residual electric polarization for permanent dipole. This interesting property can be used for the development of random access memories to retain the stored information. The physiochemical applications of perovskites material are also investigated for various applications. These growing applications of perovskites material impel researchers for modeling the structural properties relationship of perovskites. The increasing demand of synthesizing new perovskites compounds emphasizes to predict the structure of unit cell. Chemical stoichiometry based SPuDS software is used to predict the structural information of newly synthesized perovskites [6]. This program uses the bond valence parameters and nominal cation oxidation states. Advancement in high performance computing techniques allowed the researchers to develop Density-Functional Theory

1880

A. Majid et al. / Computational Materials Science 50 (2011) 1879–1888

(DFT) based computational tools [7–10]. DFT tools are based on the first principles of quantum mechanics to solve the Schrodinger equation in which total energy of the system is minimized through density functional equation. DFT approach is used in predicting the physiochemical properties of compounds [11–14]. The software tools based on DFT estimate the electronic structure of atomic/ molecular and elastics properties of compounds separately [15– 18]. Using DFT tools, one can estimate atomic/molecular structure and lattice parameters of perovskites. However, while using DFT tools, a chemist/engineer need significant computational resource [19] and sufficient in-depth knowledge related to the electronic configuration, bonding energy, charge distribution, and states density of the material under study [20]. DFT models need nearly the same amount of computational resources/skill for each newly synthesized compound. On the other hand, in industrial/engineering applications, the computational intelligence (CI) based approaches have advantages over DFT approach. Once an optimal prediction model is developed for training data, then it can be effectively used in predicting the structural properties for novel compounds. Initially, researchers proposed simple prediction models for the characterization and the identification of perovskites structure. multiple linear regression (MLR) models were developed to correlate LC with atomic parameters of simple cubic perovskites. However, simple linear models may compromise on accuracy for complex perovskites structure. Therefore, it is of practical interest to use computational intelligence (CI) based approaches where less human expertise or computational resources are available. These approaches are used extensively to predict various physiochemical properties of crystalline compounds [21–28] and amorphous metallic glass alloys [29,30]. In this study, we explored the learning capabilities of CI approaches to correlate LC with atomic parameters of complex cubic perovskites of A2BB0 O6-type. We selected support vector regression (SVR), random forest (RF), generalized regression neural network (GRNN) and multiple linear regression (MLR) approaches. During the model developing stage, the optimal values of the model parameters are computed for training data. Then the model is used to predict the lattice parameter of novel compounds. In one experiment, prediction models developed using only three features of ionic radii A, B, and B0 . Further, the prediction capability explored for six features by adding three more variables corresponding to the values of electro-negativity and oxidation states of B and B0 ions. During training stage, the optimal parametric values for prediction models are extracted from the input training data. The performance of prediction models measured in terms of percentage absolute difference (PAD). Our analysis revealed the improved prediction of CI models. For three features, the overall PAD error of SVR, RF, GRNN and MLR models found 0.413, 0.395, 0.414, and 0.523, respectively. Further, for six features, overall PAD error of these models observed 0.261, 0.418, 0.700, and 0.517, respectively. These error values are significantly lower than the overall PAD value 1.582 given by SPuDS program. The rest of paper is organized such that in Section 2, a brief description of input data collection is given and it is explained how prediction models are developed using various CI approaches. In Section 3, we discuss the simulated results and analyze the performance of prediction models. Finally, in Section 4, conclusions are provided.

2. Materials and methods The inherent flexibility of chemical substitutions on both A and B-sites, perovskites compounds contribute to various types of crystal structures. Double cubic perovskites compounds, with a general chemical formula of A2BB0 O6-type, are the result of doubling ABO3

type unit cell. In case of ordered arrangement between B- and B0 cations, generally, a superstructure of perovskites is formed. The aristo type structure of double perovskites crystallizes in Fm3 m space group with cubic symmetry [31]. In this work, four diverse type of CI approaches are employed to develop expert system for A2BB0 O6-type complex perovskites. The generic block diagram in Fig. 1 highlights how unknown parameters of each CI model are optimized. During training, CI based prediction models are developed by minimizing the PAD values. The performance of models is estimated for the novel data. The implementation detail of each model is explained in Section 2.2. 2.1. Data formation The large variation in the valence and the radii between B- and B0 -cations are mainly responsible for the structure of perovskites double cubic compounds to settle down in order/disorder way [32]. The physiochemical properties of the constituent ions are responsible in modeling the structure of perovskites. The unit cell of cubic compounds is represented with one parameter of lattice constant (LC). For this work, the values of the effective ionic radii of A, B and B0 are obtained (corresponding to specific values of oxidation states and coordination numbers) from Shannon’s work [33]. In A2BB0 O6-type cubic perovskites, A- and B-cations settle down to coordination numbers of twelve and six, respectively. The values of electro-negativity (x) of B- and B0 -cations are taken using the Pauling’s electro-negativity tables [34]. The dataset of 197 perovskites compounds are collected from the current literature. The detail list of six atomic parameters along with the experimental LC is given in Supplementary Tables 3 and 4. From the input dataset, 100 compounds are randomly selected for models development and the remaining 97 compounds are used to evaluate the prediction models. It is to be noted that most of the compounds in the dataset belong to the structures of double cubic crystalline in Fm3m space group. However, there are a few orthorhombic and monoclinic types of compounds that are claimed to belonging the space group of Pnma and P21/n, respectively. In that case, we have used the pseudocubic LC. 2.2. Tolerance factor In A2BB0 O6-type perovskites, the tolerance factor (t) or bond valence parameter roughly estimates the distortion of BO6 and B0 O6 octahedra from the cubic symmetry. We can anticipate the crystallographic structure of a double perovskite beforehand based on the distortion occurred between A-atom and O-atoms. It is observed in (A)FeReO6 type perovskites that with the decrease of cation size, the crystallographic structure transform from cubic to tetragonal and monoclinic. The large lattice transformation occur in the monoclinic compounds [35]. In double perovskites, there are two possible ways to measure this distortion [3]. The tolerance factor t depends on the effective ionic radii of constituent ions as:

t ¼ ðr A þ r O Þ tobs ¼ dAO

.pffiffiffi  2ðhr BB0 i þ r O Þ ;

.pffiffiffi  2ðdBO Þ ;

ð1Þ ð2Þ

where r A and r O represent the size of effective ionic radii (or Shannon ionic radii) of A- and O-atoms, respectively. The average ionic radius of B- and B0 -atoms is denoted by hr BB0 i. Here, dAO represents the averaged atomic distance between the A-atom and the nearest oxygen atom. Similarly, dBO is the averaged atomic distance between the B-atom and the nearest oxygen atom. Eq. (2) defines the observed tolerance factor (tobs) that is near to the experimental approach. The value of tobs gives accurate determination of the oxy-

A. Majid et al. / Computational Materials Science 50 (2011) 1879–1888

1881

Fig. 1. A generic diagram for developing CI prediction models.

gen positions that is based on the experimental distances between two nuclei. The value of tobs depends on the calculated chemical bond lengths of atoms in different compounds. Mostly the behavior of tobs aligns with the theoretical t value. Here, for simplicity, we treated the theoretical value of t equal to tobs. We have referred the reported value of t as available in the literature. However, for a few compounds where we could not find it, we have computed the value of t using the SPuDS software. For perovskites structure, the values of t lie between the range 0.87–1.04 [36]. We can predict cubic structure if the value of t is close to unity. However, if value of t is not near to unity due to the distortion in the A- and B-cations, tilt/rotation of the BO6 and B0 O6 octahedra might be expected. For A2BB0 O6 family of perovskites, we can anticipate the categorization of compounds [3,36]: Case 1: if t > 1.05, then hexagonal structure is expected, Case 2: if 1.05 > t > 1.00, the compound will be cubic within Fm3m space group, Case 3: if 1.00 > t > 0.97, the structure will tetragonal with space group I4/m, Case 4: if 0.97 > t, then compound will be either monoclinic (P21/n) or orthorhombic. In case of monoclinic and tetragonal samples, the lattice parameters a, b and c can be approximated by pseudocubic parameters p p as: ap  2a  2b, and ap  c, where ap is the lattice parameters of ideal double cubic perovskite. The lattice parameter of the pseudocubic cell, ap, which was equal to c prior to the distortion, can be used to calculatepffiffiffithe value of tetragonal distortion as: t ¼ 1  ap =c ¼ 1  a 2=c .

In case of Sr2CrWO6, the value of t (=0.998) indicates a perfect cubic structure. Similarly, we can expect cubic structure for Ba2ScNbO6 (t = 1.021), Ba2ScTaO6 (t = 1.019), and Sr2AlTaO6 (t = 1.079). Another case study of Ba+2 cation compounds i.e., Ba2B(+3)B0 (+5)O6 type perovskites indicates cubic structure; as the value of t lie near to 0.98. For example, Ba2NdIrO6 (t = 0.978), Ba2PrRuO6 (t = 0.977), Ba2NdRuO6 (t = 0.979), and Ba2HoNbO6 (t = 0.981) compounds show cubic symmetry at room temperature [37]. On the other hand, due to the large ionic radius of Ba+2 in Ba2CrWO6 (t = 1.059) compound, a structural phase transition to hexagonal occurs [31]. The tolerance factor t of Ca2CrWO6 (t = 0.9445) indicates a heavy distortion from cubic symmetry. The crystalline structure of this compound fall into the monoclinic P21/n space group [36]. Depending upon the value of t less then unity, as stated in the case 3, Sr2MnRuO6 (t = 0.987), Sr2MnNbO6 (t = 0.985), and Sr2MnSbO6 (t = 0.977) compounds deviate from the cubic symmetry that form tetragonal structure in I4/m symmetry [38]. Similar is the case for Sr2GaNbO6 (t = 0.992), Sr2CrTaO6 (t = 0.991), Sr2GaTaO6 (t = 0.990) compounds that adopt the shape of tetragonal I4/m symmetry [31]. 2.3. Model construction Four diverse types of CI approaches are employed to developed prediction model. After the construction of training data set i¼100 S ¼ fðv m of 100 compounds, we want to model the funci ; yi Þgi¼1 tional form of lattice parameter yo in terms of m independent variables i.e., f : v m ! yo , where v m 2 Rm . Our strategy was to model lattice parameter yo, in case of three and six dimensional feature vectors, y0o ¼ f ðv 3 Þ, for v 3 ¼ ðrA ; rB ; rB0 Þ and y0o ¼ f ðv 6 Þ for

1882

A. Majid et al. / Computational Materials Science 50 (2011) 1879–1888

v 6 ¼ ðrA ; rB ; rB ; xB ; xB ; zB Þ,

respectively. Here, r B and rB0 represent the size of ionic radii of B- and B0 -atoms, respectively. Variables xB and xB0 indicate the electro-negativity of B- and B0 -atoms, respectively. The oxidation state of B-atom is represented by zb. Since oxygen atom is present in each of A2BB0 O6-type perovskites, therefore, in order to avoid redundancy, no feature of this atom is adopted as independent variable. The LC values and the sizes of ionic radii of the constituent atoms are given in the units of Angstroms (Å). The percentage absolute difference (PAD) is used to estimate output performance. This measure shows the model performance in terms of percentage relative error between the experimental measured y and the model predicted yo, 0

PAD ð%Þ ¼

0

jy  yo j  100: y

ð3Þ

2.3.1. MLR model The lattice parameters of MLR model are expressed as a linear function of three and six dimensional feature vectors, y0o ¼ f ðv 3 Þ and y0o ¼ f ðv 6 Þ. The coefficients of independent variables are computed under ordinary least squares criterion, using training data, as follows:

y0MLR ¼ 4:4152 þ 1:138r A þ 1:1588rB þ 1:17758r B0 ;

þ 0:206xB0 þ 0:0384zB :

ð6Þ

where Kðv ; v i Þ is the kernel function and selected according to the complexity in the input data. The coefficients ai and ai are the Lagrange multipliers. In the training phase, the SVR model uses training data to minimize the regularized risk function RðwÞ as:

RðwÞ ¼

l 1X jy  f ðv i ; wÞje þ chw; wi; l i¼1 i

where jyi  f ðv i ; wÞje ¼



ð7Þ if jyo  f ðv ; wÞj < e

0;

jyo  f ðv ; wÞj  e; otherwise; ð8Þ

hw, wi represents the dot product of two weight vectors, c is a constant. Eq. (7) indicates the e-insensitivity loss function for e-insensitivity parameter. This loss function does not penalize errors below e. The Lagrange multipliers ai and ai , are determined by maximizing the following functional:

wða ; aÞ ¼  e

l l X X ðai þ ai Þ þ yðai  ai Þ i¼1

i¼1

l 1X ða  ai Þðaj  aj ÞKðv i ; v j Þ;  2 i;j¼1 i

ð9Þ

with constraints

ð4bÞ

The numerical value of coefficients in Eqs. (4a) and (4b) represents the slope of the regression surface. The constant term indicates the intercept of the surface. Using the above relationships, LC and PAD values of MLR models are computed for the training and the novel data. 2.3.2. SVR model The detail description of SVR model development is well documented in the statistical learning theory [39]. For N samples, SVR model is approximated using dataset S ¼ fðv i ; yi Þgi¼N i¼1 . The nonlinear input data is mapped into higher dimensional feature space via a nonlinear mapping (U) such that the input data may be linearly separable i.e., fUi ðv Þgi¼L i¼1 where L 6 N represents the number of support vectors. SVR function is estimated by the mapping of input features as: l X

l X ðai  ai ÞKðv ; v i Þ þ b; i¼1

ð4aÞ

y0MLR ¼ 4:6944 þ 1:1032rA þ 1:0903r B þ 1:5826r B0  0:714xB

yo ¼

f ðv ; a; a Þ ¼

wi Ui ðv Þ þ b:

l X ðai  ai Þ ¼ 0;

0 6 ai ;

ai 6 C; i ¼ 1; . . . ; l

ð10Þ

i¼1

where C is a trade-off parameter, which determines the cost of constraint violation. Its value is empirically determined during training. At the end of training, support vectors that correspond to non-zero values of coefficients ðai  aÞ, are selected for SVR model construction. In developing SVR models, LIBSVM software [40] is employed for 100 training compounds and output performance investigated for 97 novel compounds. We have chosen the most commonly used Gaussian kernel function. The values of cost function C and error function e, and the kernel width (r) are optimized by using well-known grid search technique. The optimal values, for six features, are found to be C = 100, r = 1.725 and e = 0.001. For three features, these values are determined to be C = 100, r = 1.01 and e = 0.001. During optimization, the error between the experimental LC and predicted values is minimized.

ð5Þ

i¼1

The above equation can be solved by using a kernel function as:

2.3.3. Random forest (RF) model RF based regression models are gaining popularity in developing improved performance CI models. The theoretical description

Table 1 Overall performance summary using three features (3F) and six features (6F). Input data/prediction models

Training data

Novel data

Overall mean PAD

Mean PAD

Min. PAD

Max. Pad

R-values

Mean PAD

Min. PAD

Max. Pad

R-values

SVR model

3F 6F

0.3606 0.1768

0.0013 0.0001

2.0756 1.4274

0.9876 0.9854

0.466 0.3453

0.0008 0.0017

1.9648 1.8859

0.979 0.985

0.4133 0.2611

RF model

3F 6F

0.2622 0.2466

0.00 0.0078

1.7136 1.6322

0.9937 0.9943

0.5280 0.5898

0.0183 0.0172

3.000 2.9635

0.974 0.971

0.3951 0.4181

GRNN model

3F 6F

0.2556 0.3397

0.000 0.000

1.0781 1.3803

0.994 0.9190

0.572 1.061

0.0103 0.0151

4.8369 4.876

0.958 0.919

0.4138 0.7004

MLR model

3F 6F

0.5159 0.4708

0.0027 0.0192

2.094 2.457

0.9805 0.9865

0.530 0.5630

0.0016 0.0095

2.4540 2.2992

0.972 0.979

0.5229 0.5169

1.631

0.628

7.2716

0.8784

1.5336

0.0077

7.722

0.896

1.5823

SPuDS

1883

A. Majid et al. / Computational Materials Science 50 (2011) 1879–1888 Table 2 Performance comparison on the novel dataset using three features. Sr. no

Comp. name

Expt. LC (Å)

1 2 3 4 5 6 7 8 9 10 11

Sr2AlSbO6 Ba2GdTaO6 Sr2FeMoO6 Sr2FeReO6 Sr2CrReO6 Ba2YRuO6 Ba2LuRuO6 Sr2 InSbO6 Sr2GaSbO6 Ba2ErMoO6 Sr2MnReO6

12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

Sr2MnMoO6 Ba2TbIrO6 Ba2HoTaO6 Ba2YMoO6 Ba2YbMoO6 Ba2PrIrO6 Ba2HoRuO6 Ba2SmSbO6 Ba2HoSbO6 Ba2YSbO6 Ba2SmMoO6 Ba2EuMoO6 Ba2GdMoO6 Ba2DyMoO6 Ba2DySbO6 Ba2PrRuO6 Ba2NdRuO6 Ba2MnWO6 Sr2FeReO6 Ba2FeNbO6 Ba2FeWO6 Ba2CePtO6 Ba2PrPtO6 Ba2YbRuO6 Sr2MgIrO6 Ba2CaIrO6 Sr2NiMoO6 Ba2ScBiO6 Ba2YSnO5.5 Ba2DySnO5.5 Ba2HoSnO5.5 Ca2CrWO6 Ba2CrWO6 Sr2ScReO6 Ba2ErNbO6 Ba2ErSbO6 Ba2LiReO6 Ba2NaReO6 Sr2LiOsO6 Sr2NaReO6 Ba2CoReO6 Ba2DyTaO6 Ba2ErReO6 Ba2ErUO6 Ba2EuTaO6 Ba2GdNbO6 Ba2GdSbO6 Ba2HoTaO6 Ba2InPaO6 Ba2InTaO6 Ba2LaReO6 Ba2LuTaO6 Ba2NdPaO6 Ba2PrPaO6 Ba2ScOsO6 Ba2ScSbO6 Ba2SmNbO6 Ba2TbPaO6 Ba2TmNbO6 Ba2YNbO6 Ba2YTaO6

7.7662 8.47 7.9072 7.887 7.8152 8.339 8.272 8.094 7.88 8.4368 8.1865 7.9900 8.01 8.3848 8.40748 8.39173 8.3378 8.4013 8.3419 8.50908 8.4119 8.402 8.4762 8.459 8.4481 8.4062 8.4247 8.48416 8.4706 8.1985 7.90089 8.118 8.135 8.4088 8.3892 8.2753 7.8914 8.3567 7.9019 8.366 8.521 8.513 8.498 7.66 8.06 7.98686 8.4193 8.396 8.118 8.296 7.86 8.13 8.086 8.545 8.354 8.67 8.506 8.496 8.44 8.442 8.596 8.28 8.58 8.376 8.84 8.862 8.152 8.197 8.518 8.753 8.408 8.441 8.42811

SVR

RF

GRNN

MLR

SPuDS

Pred. LC

PAD

Pred. LC

PAD

Pred. LC

PAD

Pred. LC

PAD

Pred. LC

PAD

7.7584 8.4749 7.8989 7.8723 7.8386 8.3063 8.2638 8.0548 7.8606 8.3671 8.0266

0.1007 0.0578 0.1049 0.1861 0.2992 0.3918 0.0989 0.4839 0.2465 0.8258 1.95 0.458 0.7633 0.2346 0.2971 0.157 0.0432 0.6769 0.4138 1.0033 0.5825 0.4786 0.4106 0.3436 0.3279 0.1684 0.589 1.0528 0.9698 0.9883 0.1179 0.469 0.9973 0.4739 0.5337 0.0444 0.661 0.6646 0.3635 0.9276 0.0689 0.3386 0.3545 1.4238 0.644 0.105 0.0008 0.5416 0.2356 0.8088 1.1261 0.0674 0.3258 0.1655 0.4244 0.0913 0.2101 0.2222 0.4133 0.127 0.2613 0.2805 1.4463 0.0789 0.1734 0.2864 0.1105 0.2513 0.2237 0.0532 0.0085 0.1152 0.0377

7.8277 8.4896 7.9189 7.9173 7.8432 8.3543 8.3392 8.0969 7.8467 8.3814 7.9911

0.7923 0.2309 0.1477 0.3837 0.3588 0.1832 0.8118 0.0354 0.4228 0.6564 2.3869 0.0138 1.2151 0.5423 0.3123 0.0969 0.3805 0.2998 0.1923 1.2686 0.2849 0.2216 0.8733 0.8069 0.6105 0.1877 0.4095 0.9281 0.7695 0.4813 0.345 0.1972 0.1008 0.2982 0.156 0.849 0.7959 0.316 0.6342 0.7756 0.182 0.2986 0.4754 2.3968 0.6928 0.2612 0.0188 0.1762 0.2834 0.7024 0.4166 0.3643 0.1481 0.2633 0.3642 0.4191 0.1526 0.0759 0.5151 0.1315 0.4801 0.0206 1.8218 0.0492 0.6127 0.8071 0.2785 1.0321 0.0329 0.1002 0.0264 0.1197 0.0332

7.7955 8.4766 7.8634 7.8418 7.8429 8.3573 8.2821 8.0520 7.8489 8.3988 7.9913

0.3774 0.0774 0.5533 0.5736 0.3546 0.2190 0.1218 0.5191 0.3952 0.4501 2.38 0.0163 0.6245 0.3523 0.1375 0.1407 0.5181 0.3722 0.1918 0.9558 0.2377 0.1245 0.2579 0.2101 0.2001 0.0421 0.3393 0.9858 0.8045 0.5397 0.7610 1.1872 0.7360 0.1579 0.2285 0.4148 0.9289 0.5330 0.5389 1.5516 1.0992 0.8647 0.8216 2.5463 0.4797 0.0834 0.0894 0.1014 0.2227 0.4192 0.7258 0.3679 0.2437 0.3560 0.1063 0.4517 0.0749 0.1783 0.3485 0.2806 1.2347 0.2010 2.1528 0.1981 0.5822 0.7327 0.4379 0.8304 0.0582 0.1731 0.0259 0.2688 0.1162

7.7394 8.4709 7.8846 7.8313 7.7966 8.2937 8.2485 8.0465 7.8379 8.362 7.9925

0.3453 0.0105 0.2856 0.7057 0.2383 0.5436 0.2844 0.5871 0.5344 0.8867 2.37 0.0313 0.6678 0.1839 0.2442 0.2163 0.0156 0.7026 0.5643 1.0112 0.6529 0.5496 0.4177 0.3659 0.3608 0.2226 0.6525 1.016 0.9533 0.9295 0.0524 0.1644 1.005 0.5155 0.5593 0.2262 0.1153 0.7387 0.0082 1.1276 0.0628 0.1945 0.2213 1.2247 0.0132 0.4955 0.0479 0.6166 0.6369 0.8868 0.152 0.5796 0.0891 0.2194 0.5421 0.4802 0.2493 0.2682 0.4481 0.1794 0.4236 0.3571 1.2709 0.0537 0.8124 0.9279 0.1767 0.1831 0.2537 0.6209 0.0514 0.1676 0.0148

7.7718 8.6178 7.9798 7.8858 7.8158 8.4858 8.3898 8.3358 7.9918 8.4378 8.291

0.0721 1.745 0.9182 0.0152 0.0077 1.7604 1.4241 2.9874 1.4188 0.0119 1.276 3.767 0.3521 0.6047 1.55 1.2878 0.6477 1.1046 1.8689 2.3354 0.617 1.9971 1.9065 3.1068 1.701 0.6852 1.295 2.8246 2.21 0.3476 0.191 1.6038 0.2459 2.5771 1.1813 1.2386 1.5663 2.4687 0.4189 1.1929 Predict Predict Predict 2.8172 2.2854 0.9884 0.5274 1.2362 0.4213 5.5665 1.1985 7.722 0.1113 0.646 0.1221 1.1972 2.8427 1.2218 2.628 1.1348 0.3327 0.1425 1.5361 0.6423 2.1561 2.6473 0.3214 0.4001 1.5004 1.5035 0.735 0.7914 1.1591

8.0711 8.4045 8.4325 8.3786 8.3414 8.3444 8.3074 8.4237 8.3629 8.3618 8.4414 8.4299 8.4204 8.392 8.3751 8.3948 8.3885 8.2795 7.9102 8.0799 8.2161 8.3689 8.3444 8.2716 7.9436 8.4122 7.9306 8.4436 8.5269 8.5418 8.5281 7.7691 8.0081 7.9784 8.4194 8.3505 8.1019 8.3641 7.9435 8.1355 8.1123 8.443 8.3185 8.6621 8.4881 8.4771 8.4051 8.4313 8.5735 8.3046 8.4559 8.3826 8.8247 8.8366 8.143 8.1764 8.4989 8.7483 8.4073 8.4313 8.4313

8.1073 8.4303 8.4337 8.3836 8.3695 8.3761 8.3579 8.4011 8.3879 8.3834 8.4022 8.3907 8.3965 8.3904 8.3902 8.4054 8.4054 8.238 7.9281 8.134 8.1268 8.3837 8.3761 8.3456 7.9542 8.3831 7.952 8.4309 8.5365 8.5384 8.5384 7.8436 8.0042 7.9659 8.4209 8.3812 8.098 8.3553 7.8877 8.1596 8.098 8.4347 8.3844 8.6337 8.493 8.4896 8.3965 8.4309 8.5547 8.2797 8.4237 8.3801 8.7858 8.7905 8.1293 8.1124 8.5152 8.7442 8.4102 8.4309 8.4309

8.0600 8.4143 8.4190 8.4035 8.3810 8.3700 8.3579 8.4277 8.3919 8.3915 8.4543 8.4412 8.4312 8.4097 8.3961 8.4005 8.4025 8.2427 7.8408 8.0216 8.1949 8.3955 8.3700 8.3096 7.9647 8.4012 7.9445 8.4958 8.4273 8.4394 8.4282 7.8550 8.0213 7.9801 8.4118 8.3875 8.1029 8.3318 7.9120 8.1599 8.1057 8.4269 8.3629 8.7092 8.4996 8.4809 8.4106 8.4183 8.4899 8.2648 8.3953 8.3926 8.7885 8.7971 8.1163 8.1289 8.5130 8.7379 8.4058 8.4183 8.4183

8.0635 8.4002 8.428 8.3736 8.3365 8.3423 8.2948 8.423 8.357 8.3558 8.4408 8.428 8.4176 8.3875 8.3697 8.398 8.3899 8.2747 7.8967 8.1313 8.2168 8.3655 8.3423 8.2566 7.9005 8.4184 7.9012 8.4603 8.5156 8.5296 8.5168 7.7538 8.0611 7.9472 8.4153 8.3442 8.0693 8.3706 7.8669 8.1771 8.0932 8.4384 8.3087 8.6284 8.4848 8.4732 8.4022 8.4269 8.5596 8.311 8.471 8.3805 8.7682 8.7798 8.1376 8.182 8.4964 8.6987 8.4037 8.4269 8.4269

7.9818 8.3341 8.5378 8.4998 8.3918 8.4941 8.4978 8.7078 8.4638 8.5698 8.6378 8.7218 8.5918 8.4638 8.5338 8.7238 8.6578 8.227 7.8858 7.9878 8.115 8.1921 8.2901 8.3778 8.015 8.563 7.935 8.4658 Cannot Cannot Cannot 7.8758 7.8758 8.0658 8.4638 8.4998 8.0838 8.7578 7.7658 8.7578 8.095 8.4898 8.3438 8.7738 8.7478 8.5998 8.6618 8.5378 8.6246 8.2918 8.7118 8.4298 9.0306 9.0966 8.1258 8.2298 8.6458 8.8846 8.4698 8.5078 8.5258

(continued on next page)

1884

A. Majid et al. / Computational Materials Science 50 (2011) 1879–1888

Table 2 (continued) Sr. no

73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 Mean PAD

Comp. name

Ba2YbPaO6 Sr2AlNbO6 Sr2CrMoO6 Sr2FeBiO6 Sr2InOsO6 Sr2RhNbO6 Sr2ScReO6 Ba2BaUO6 Ba2CaRe6 Ba2CaWO6 Ba2CdReO6 Ba2CoUO6 Ba2FeReO6 Ba2MgOsO6 Ba2MgUO6 Ba2MnWO6 Ba2NiReO6 Ba2ZnMoO6 Ba2ZnUO6 Ca2MgWO6 Pb2MgWO6 Sr2CrUO6 Sr2MgTeO6 Sr2NiUO6 Sr2MnRuO6

Expt. LC (Å)

8.678 7.786 7.84 8.063 8.06 7.914 8.02 8.89 8.371 8.388 8.322 8.372 8.0518 8.08 8.381 8.1844 8.04 8.103 8.4 7.7 8.006 8.09 7.94 8.15 7.9333

SVR

RF

GRNN

MLR

SPuDS

Pred. LC

PAD

Pred. LC

PAD

Pred. LC

PAD

Pred. LC

PAD

Pred. LC

PAD

8.6789 7.7967 7.8694 8.099 8.0264 8.0523 7.9834 8.9842 8.3792 8.4648 8.3337 8.3925 8.1881 8.0688 8.3447 8.2795 8.0353 8.1494 8.3767 7.8513 8.0271 8.2459 7.9347 8.1067 7.8606 0.466

0.0106 0.1362 0.3750 0.4459 0.4171 1.7474 0.456 1.0596 0.0976 0.9158 0.141 0.2447 1.715 0.1383 0.6232 1.1623 0.0587 0.5725 0.2419 1.9648 0.2629 1.9273 0.0672 0.5308 0.9127

8.6887 7.8327 7.8465 8.1886 8.0872 8.0104 7.9659 8.6734 8.3635 8.4103 8.3421 8.3471 8.135 8.0911 8.3435 8.238 8.0598 8.1015 8.3456 7.931 8.032 8.2497 7.9457 8.1749 7.9157 0.5280

0.1235 0.5979 1.9876 1.5582 0.3369 1.2184 0.6741 2.4361 0.0894 0.2664 0.2412 0.2977 1.0562 0.1378 0.6371 0.6544 0.2456 0.0183 0.6117 3.0004 0.3242 1.9736 0.0724 0.3059 0.2186

8.6980 7.7925 7.8524 8.1893 8.0118 8.1072 7.9804 8.46 8.3719 8.4455 8.3673 8.4255 8.1859 8.1000 8.4032 8.2427 8.0891 8.1146 8.438 7.8328 8.0386 8.2181 7.9567 8.1891 7.8380 0.572

0.2304 0.0828 0.1581 1.5669 0.5980 2.4412 0.4933 4.8369 0.0103 0.6857 0.5444 0.6390 1.6876 0.2474 0.0737 0.7129 0.6103 0.1436 0.4884 1.7245 0.4073 1.5840 0.2106 0.4800 1.1975

8.6407 7.8162 7.8556 8.1568 8.0021 8.0596 7.953 9.1082 8.3829 8.4717 8.325 8.4129 8.1812 8.0496 8.3781 8.2747 8.0237 8.1526 8.4013 7.84 8.0107 8.2773 7.8827 8.1499 7.8047 0.530

0.4297 0.3868 0.1989 1.1632 0.7186 1.8394 0.8352 2.454 0.1423 0.9979 0.0357 0.488 1.6304 0.3768 0.2252 1.1034 0.2031 0.6127 0.0508 1.8178 0.0583 2.3157 0.7212 0.0016 1.6172

8.7506 7.7098 7.9098 8.2858 8.2318 8.0558 8.0658 9.485 8.645 8.581 8.519 8.2723 7.8858 8.259 8.301 8.227 8.019 8.035 8.323 8.033 8.033 8.375 8.033 8.223 7.9678 1.5353

0.8366 0.9787 0.8903 2.7632 2.1315 1.7918 0.5711 6.6929 3.4586 2.3009 2.3672 1.1909 2.0396 2.2153 0.9545 0.5205 0.2612 0.8392 0.8813 4.3247 0.3372 3.5229 1.1713 0.8957 0.4349

of RF network is explained in [41]. The generalized performance of this technique is appreciable as compared to many other learning algorithms. The random forests model is improved by introducing additional layer of randomness to bootstrap aggregating. RF model used to predict LC of perovskites compounds. The RF model is provided a set of 100 training samples along with the output values. During training, RF algorithm randomly selects subsets mtr descriptors to grow trees such that each tree is expanded using bootstrap technique. mtr is the number of variables to split on at each node. We have empirically set the values of mtr to be 3 and 6 for three and six features training data, respectively. The RF network has provided good performance for (ntree=) 1000 randomly generated trees. The output of each tree depends on the values of an independently sampled random vector. In one run, RF predictor gives the average results for 1000 random trees. In this way, the best results, after 20 runs, are reported for the training data and the novel data. The implementation of RF regression approach is carried out using software package [42]. 2.3.4. GRNN model GRNN models belong to the category of supervised statistical method that is used for developing CI models. The architecture of GRNN model is a combination of multilayer perceptrons and radial basis functions. The weights are optimized from the training data. GRNN models commonly contains four layers i.e., input, a layer of radial centers, a layer of regression units, and output [43]. The units in the radial layer represent the centers for the clusters of the training data. In the paper, this layer trained by using k-means clustering algorithm. The units of the regression layer are linear and their numbers are one unit more than the output layer. The main advantage of GRNN models is that they automatically extract the appropriate regression model from the data. During development, an optimal performance trade-off is adjusted for the training data and the novel data so that overall prediction error is minimized. For low value of Gaussian function parameter r, the network is over-trained and the prediction error would be higher for novel data. On the other hand, for large value, the network remains under-trained and the prediction error would

increase for training data. The best results of this network are reported by finding the optimal spread values of r = 0.1015 and r = 0.0284 for three and six features, respectively. 3. Results and discussions The output prediction of each compound is estimated for three and six features of the training and novel data. The overall performance summary is shown in Table 1. The detail results are given in Table 2 and 3 and Supplementary Tables 1 and 2. The linear correlation between the experimental and predicted values, for three features novel dataset, is given in Fig. 2A–E. The correlation curve for six features is given in the Supplementary Fig. 1A–E. We will discuss the overall performance of prediction models and some individual compounds with respect to structural distortion. 3.1. Prediction comparison for three and six features For three features training data, Table 1 highlights the mean PAD values of 0.360, 0.262, 0.256, 0.516, and 1.631 for SVR, RF, GRNN, MLR, and SPuDS models, respectively. This table shows both RF and GRNN models give significantly margin of improvement over other prediction models. For three features of the novel data, in Table 1, the mean PAD values of four models are determined to be 0.466, 0.528, 0.572, 0.530 and 1.533. The overall performance of prediction models, for training and novel data, is given in the last column of the Table 1. This table shows the overall mean PAD values of SVR, RF, GRNN, and MLR models to be 0.413, 0.395, 0.414, and 0.523, respectively. However, the overall PAD value computed using conventional SPuDS software come out to be 1.582. From these results, it is inferred that CI models give approximately three more accurate predictions than the SPuDS. It is to be noted that, for small features, the overall performance of SVR, RF, and GRNN models is nearly 0.39–0.41. However, MLR model give 0.52. Therefore, we can conclude that SVR, RF, and GRNN based nonlinear models can correlate lattice parameters to the ionic radii more effectively than linear regression model. For small feature set, the overall performance of RF model is the most accurate.

1885

A. Majid et al. / Computational Materials Science 50 (2011) 1879–1888 Table 3 Performance comparison on the training dataset using three features. Sr. no

Comp. name

Expt. LC (Å)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73

Ba2AgIO6 Ba2LiOsO6 Ba2NaIO6 Ba2NaOsO6 Ca2LiOsO6 Ca2LiReO6 Sr2LiReO6 Sr2NaOsO6 Ba2BiTaO6 Ba2CePaO6 Ba2DyNbO6 Ba2DyPaO6 Ba2ErNbO6 Ba2ErPaO6 Ba2ErRuO6 Ba2ErTaO6 Ba2EuNbO6 Ba2EuPaO6 Ba2FeMoO6 Ba2FeReO6 Ba2GdPaO6 Ba2GdReO6 Ba2HoNbO6 Ba2HoPaO6 Ba2InNbO6 Ba2InOsO6 Ba2InReO6 Ba2InSbO6 Ba2InUO6 Ba2LaPaO6 Ba2LuNbO6 Ba2LuPaO6 Ba2MnReO6 Ba2NdNbO6 Ba2NdReO6 Ba2NdTaO6 Ba2RhNbO6 Ba2ScNbO6 Ba2ScPaO6 Ba2ScReO6 Ba2ScTaO6 Ba2ScUO6 Ba2SmPaO6 Ba2SmTaO6 Ba2TlSbO6 Ba2TlTaO6 Ba2TmPaO6 Ba2TmTaO6 Ba2YPaO6 Ba2YReO6 Ba2YUO6 Ba2YbNbO6 Ba2YbTaO6 Pb2ScTaO6 Sr2AlTaO6 Sr2CoSbO6 Sr2CrOsO6 Sr2CrWO6 Sr2GaOsO6 Sr2GaReO6 Sr2InReO6 Sr2InUO6 Sr2ScBiO6 Sr2ScOsO6 Sr2RhTaO6 Sr2CrNbO Ba2CaMoO6 Ba2CaOsO6 Ba2CaTeO6 Ba2CaUO6 Ba2CdMoO6 Ba2CdOsO6 Ba2CoMoO6

8.46 8.1046 8.33 8.287 7.83 7.83 7.907 8.13 8.568 8.8 8.437 8.74 8.427 8.716 8.323 8.423 8.507 8.783 8.0121 8.0518 8.774 8.431 8.434 8.73 8.279 8.224 8.258 8.269 8.52 8.885 8.364 8.666 8.1865 8.54 8.51 8.556 8.17 8.234 8.549 8.163 8.2315 8.49 8.792 8.519 8.3809 8.42 8.692 8.4064 8.718 8.372 8.69 8.374 8.39 8.1401 7.7913 7.88 7.84 7.832 7.82 7.843 8.071 8.33 8.1816 8.02 7.939 7.87 8.3803 8.362 8.393 8.67 8.3242 8.325 8.0862

SVR

RF

GRNN

MLR

SPuDS

Pred. LC

PAD

Pred. LC

PAD

Pred. LC

PAD

Pred. LC

PAD

8.4541 8.0964 8.3641 8.3563 7.8275 7.8306 7.9474 8.1292 8.5694 8.8598 8.4453 8.7376 8.4194 8.7073 8.2957 8.4194 8.4848 8.7837 8.0409 8.0059 8.7721 8.3695 8.4325 8.7226 8.3046 8.2062 8.2131 8.2419 8.5308 8.8843 8.3839 8.6659 8.208 8.5231 8.4136 8.5231 8.2284 8.2284 8.4847 8.1429 8.2284 8.4436 8.7976 8.4968 8.3448 8.4133 8.6932 8.4073 8.7212 8.3295 8.6757 8.3926 8.3926 8.1067 7.7902 7.8488 7.8348 7.8727 7.8405 7.8443 8.0318 8.2997 8.2295 7.9735 7.9539 7.8925 8.4469 8.3712 8.3955 8.7312 8.3983 8.3261 8.1561

0.0702 0.1009 0.4094 0.8368 0.0315 0.0073 0.5113 0.0093 0.0159 0.68 0.0988 0.0276 0.0906 0.0998 0.3285 0.0431 0.2604 0.0075 0.3594 0.5701 0.0221 0.7299 0.0183 0.0851 0.3096 0.2166 0.5435 0.3274 0.1265 0.0076 0.2374 0.0013 0.2628 0.1977 1.1326 0.3843 0.7145 0.0686 0.7519 0.2462 0.0376 0.5464 0.0637 0.2606 0.4304 0.079 0.0139 0.0105 0.0366 0.508 0.1643 0.2216 0.0305 0.4102 0.0146 0.3957 0.0658 0.5194 0.2617 0.016 0.4851 0.3643 0.5857 0.5804 0.1873 0.2856 0.7947 0.1096 0.0298 0.7054 0.8908 0.0135 0.8636

8.4098 8.098 8.3553 8.3236 7.8567 7.8591 7.8901 8.1487 8.5328 8.8072 8.4347 8.7284 8.4209 8.7099 8.3519 8.4209 8.4945 8.7842 8.0317 8.0413 8.7838 8.4025 8.4337 8.7222 8.2797 8.2315 8.2385 8.2473 8.4739 8.8283 8.3805 8.6875 8.2106 8.5389 8.4465 8.5389 8.2057 8.2057 8.4619 8.1262 8.2057 8.4309 8.7873 8.5152 8.3776 8.414 8.6993 8.4102 8.7142 8.3883 8.6543 8.3838 8.3838 8.1371 7.8327 7.8583 7.842 7.8464 7.8407 7.842 8.0897 8.3286 8.2249 7.9636 7.9526 7.8643 8.4085 8.3639 8.3776 8.6438 8.3778 8.3428 8.1061

0.5936 0.0817 0.3035 0.4417 0.3408 0.3715 0.2134 0.2301 0.411 0.0816 0.0268 0.1325 0.0725 0.0698 0.3472 0.0251 0.1475 0.0132 0.2449 0.1309 0.1118 0.3376 0.0031 0.0891 0.0083 0.0909 0.2367 0.262 0.5405 0.6383 0.1972 0.2479 0.2943 0.0129 0.7456 0.1999 0.4365 0.3444 1.0185 0.451 0.3136 0.6963 0.0536 0.0447 0.0389 0.0713 0.0837 0.0455 0.0435 0.1951 0.4109 0.1175 0.0734 0.0368 0.5304 0.2749 0.0251 0.1836 0.2646 0.0126 0.232 0.0163 0.5287 0.7031 0.171 0.0723 0.336 0.0223 0.1833 0.3017 0.6436 0.2142 0.246

8.46 8.1019 8.3318 8.3277 7.83 7.83 7.9147 8.1568 8.5552 8.8212 8.4291 8.7292 8.4118 8.7124 8.3502 8.4118 8.4946 8.7696 8.0306 8.0422 8.7602 8.3862 8.419 8.7199 8.2648 8.2219 8.2242 8.233 8.4456 8.8457 8.3934 8.6886 8.2039 8.5303 8.4186 8.5303 8.1892 8.1892 8.5188 8.1143 8.1892 8.4958 8.7775 8.5107 8.3848 8.4088 8.7058 8.4058 8.7191 8.3684 8.7158 8.3984 8.3984 8.132 7.792 7.8494 7.8416 7.855 7.8412 7.8425 8.0234 8.2702 8.1845 7.9736 7.9232 7.8608 8.4266 8.3645 8.3871 8.694 8.4026 8.3617 8.1181

0.0000 0.0335 0.0214 0.4917 0.0002 0.0003 0.0971 0.3292 0.1498 0.2409 0.0941 0.1234 0.1807 0.0416 0.3272 0.1333 0.1463 0.1530 0.2307 0.1195 0.1571 0.5316 0.1773 0.1160 0.1720 0.0261 0.4091 0.4352 0.8736 0.4423 0.3513 0.2608 0.2120 0.1141 1.0735 0.3009 0.2353 0.5440 0.3534 0.5962 0.5132 0.0684 0.1649 0.0968 0.0468 0.1333 0.1582 0.0069 0.0130 0.0433 0.2966 0.2912 0.1000 0.0991 0.0080 0.3887 0.0202 0.2942 0.2705 0.0068 0.5902 0.7180 0.0356 0.5783 0.1990 0.1171 0.5519 0.0294 0.0706 0.2774 0.9416 0.4412 0.3935

8.5212 8.0604 8.3706 8.3617 7.7531 7.762 7.8758 8.1682 8.5775 8.8029 8.4408 8.6894 8.4153 8.6639 8.2821 8.4153 8.4813 8.7299 8.0781 8.0248 8.7195 8.3643 8.428 8.6766 8.311 8.1955 8.2044 8.2399 8.5241 8.8284 8.3817 8.6303 8.1859 8.523 8.4165 8.523 8.2472 8.2472 8.4959 8.1407 8.2472 8.4603 8.7427 8.4941 8.3384 8.4095 8.6523 8.4037 8.6755 8.3203 8.64 8.3898 8.3898 8.1107 7.8104 7.8263 7.7877 7.8676 7.7935 7.8024 8.011 8.3306 8.2669 7.9383 7.9611 7.9031 8.4539 8.374 8.4007 8.7026 8.396 8.3161 8.1584

0.7237 0.5454 0.4871 0.9014 0.9816 0.8682 0.3944 0.4703 0.1109 0.0335 0.0446 0.5792 0.1392 0.5979 0.4917 0.0918 0.3018 0.6041 0.8234 0.3353 0.621 0.7906 0.071 0.6113 0.3862 0.346 0.6488 0.3515 0.0478 0.6366 0.2112 0.4122 0.0072 0.1986 1.0988 0.3852 0.9453 0.1605 0.6217 0.2734 0.1915 0.3494 0.5609 0.2927 0.5066 0.125 0.4568 0.0324 0.4878 0.6175 0.5759 0.1884 0.0027 0.3615 0.245 0.6815 0.6671 0.4547 0.339 0.518 0.7439 0.0074 1.0423 1.0181 0.278 0.4209 0.8788 0.1439 0.0914 0.3756 0.8626 0.107 0.893

Pred. LC a

7.7658 a

8.4398 7.7658 8.0838 8.0838 8.4398 8.6758 9.1226 8.4718 8.8226 8.4458 8.7966 8.4238 8.4638 8.7298 9.0806 7.9798 7.8858 8.9506 8.4978 8.5198 8.8706 8.2738 8.2318 8.1718 8.3358 8.6018 9.1646 8.4118 8.7626 7.8878 8.6798 8.5778 8.6978 8.0558 8.1678 8.5186 8.0658 8.1858 8.4958 8.9966 8.6638 8.1138 8.0698 8.8206 8.4878 8.8586 8.4058 8.8358 8.3998 8.4178 8.1858 7.7278 7.8058 7.8758 7.8758 7.8878 7.8278 8.1718 8.6018 8.4658 8.1258 8.0738 7.9178 8.561 8.807 8.581 8.849 8.435 8.681 8.011

PAD – 4.1803 – 1.8439 0.8199 3.2414 2.7166 3.8106 1.2582 3.6659 0.4125 0.9451 0.2231 0.9247 1.2111 0.4844 2.619 3.3884 0.4031 2.0617 2.0128 0.7923 1.0173 1.6105 0.0628 0.0948 1.0438 0.8078 0.9601 3.1469 0.5715 1.1147 3.6487 1.637 0.7967 1.6573 1.3978 0.8042 0.3556 1.1907 0.5548 0.0683 2.3271 1.6997 3.187 4.1591 1.4795 0.9683 1.6128 0.4037 1.6778 0.3081 0.3313 0.5614 0.8154 0.9416 0.4566 0.5592 0.867 0.1938 1.2489 3.2629 3.4736 1.3192 1.6979 0.6074 2.1562 5.3217 2.24 2.0646 1.3311 4.2763 0.9303

(continued on next page)

1886

A. Majid et al. / Computational Materials Science 50 (2011) 1879–1888

Table 3 (continued) Sr. no

Comp. name

74 Ba2CoReO6 75 Ba2CoWO6 76 Ba2CrUO6 77 Ba2FeUO6 78 Ba2MgMoO6 79 Ba2MgReO6 80 Ba2MgTeO6 81 Ba2MgWO6 82 Ba2MnMoO6 83 Ba2MnUO6 84 Ba2NiMoO6 85 Ba2NiUO6 86 Ba2NiWO6 87 Ba2ZnOsO6 88 Ba2ZnReO6 89 Ba2ZnWO6 90 Ca2CaWO6 91 Pb2FeWO6 92 Pb2MgTeO6 93 Sr2CaOsO6 94 Sr2CoUO6 95 Sr2FeOsO6 96 Sr2FeUO6 97 Sr2MgUO6 98 Sr2MnUO6 99 Sr2CoReO6 100 Sr2MgReO6 Mean PAD a

Expt. LC (Å)

8.078 8.1137 8.30 8.363 8.0838 8.082 8.13 8.0985 8.168 8.469 8.035 8.336 8.0748 8.095 8.106 8.116 8.00 8.05 7.99 8.22 8.19 7.85 8.11 8.19 8.28 7.951 7.933 0.3606

SVR

RF

GRNN

MLR

SPuDS

Pred. LC

PAD

Pred. LC

PAD

Pred. LC

PAD

Pred. LC

PAD

Pred. LC

PAD

8.1061 8.1697 8.2957 8.439 8.1224 8.0743 8.0857 8.1355 8.2644 8.5133 8.081 8.2957 8.0935 8.0941 8.0998 8.1629 8.047 8.0949 7.9852 8.1432 8.1783 7.9782 8.2218 8.1463 8.2809 7.9503 7.9262

0.3476 0.7606 0.0518 0.9087 0.4776 0.0949 0.5453 0.457 1.1799 0.5231 0.5723 0.484 0.2319 0.0112 0.0766 0.5763 0.587 0.5582 0.0599 0.8135 0.1423 1.6333 1.3791 0.534 0.0108 0.0085 0.0829

8.098 8.1124 8.3488 8.3531 8.0923 8.0898 8.1097 8.0975 8.226 8.409 8.0567 8.3408 8.0608 8.0982 8.0973 8.107 8.1371 8.05 8.0263 8.1614 8.1804 7.907 8.1737 8.1781 8.2708 7.9447 7.9388 0.2622

0.2473 0.0544 0.624 0.1189 0.1055 0.0967 0.2494 0.0121 0.7101 0.7083 0.2707 0.0573 0.1737 0.0397 0.1078 0.1127 1.7136 0.000 0.4541 0.5914 0.1171 0.726 0.7854 0.1452 0.1115 0.0797 0.0754

8.1041 8.1245 8.297 8.382 8.1019 8.1005 8.1011 8.1055 8.2275 8.4348 8.0732 8.3428 8.0729 8.1018 8.1031 8.1207 8.00 8.0545 7.984 8.1832 8.179 7.9152 8.1775 8.187 8.2729 7.9436 7.9499 0.2556

0.3236 0.2037 0.170 0.2275 0.2246 0.2285 0.3556 0.0872 0.7288 0.4043 0.4758 0.0819 0.0232 0.0844 0.0358 0.0569 0.0000 0.0553 0.0751 0.3260 0.1347 0.8302 0.8320 0.0363 0.0858 0.0925 0.2151

8.0874 8.1762 8.3154 8.4476 8.1295 8.0584 8.0762 8.1472 8.2569 8.5056 8.0947 8.3433 8.1125 8.0727 8.0816 8.1704 8.1645 8.0802 7.9396 8.1806 8.2136 7.9256 8.2542 8.1846 8.3121 7.894 7.865 0.5159

0.1165 0.8413 0.2218 1.0118 0.5653 0.2915 0.6618 0.6018 1.0889 0.4317 0.7431 0.0879 0.4664 0.275 0.3008 0.6689 2.0556 0.3752 0.6303 0.3584 0.2882 0.9635 1.7776 0.0655 0.3877 0.7175 0.8549

8.095 8.031 8.2458 8.383 8.013 8.097 8.033 8.033 8.207 8.495 7.935 8.223 7.955 8.281 8.119 8.055 8.581 8.115 8.033 8.807 8.299 8.341 8.383 8.301 8.495 8.095 8.097 1.631

0.2104 0.9496 0.650 0.2391 0.8755 0.1856 1.1931 0.8087 0.4775 0.307 1.2446 1.3556 1.4836 2.2977 0.1604 0.7531 7.2625 0.8075 0.5382 7.2716 1.3309 6.2548 3.3662 1.3553 2.5966 1.8133 2.0695

Note: SPuDS program unable to predict LC corresponding to (1) Ba2AgIO6 and (2) Ba2NaIO6 compounds.

For six features training data, Table 1 indicates significantly lower mean PAD values 0.177, 0.246, 0.339, and 0.471 for SVR, RF, GRNN, and MLR models, respectively. Similarly, for six feature novel data, these four models give the mean PAD values 0.345, 0.589, 1.061, and 0.563. This indicates GRNN model, for large feature set, gives higher error value 1.061. The overall performance of SVR, RF, GRNN, and MLR models are found 0.261, 0.418, 0.700, and 0.517, respectively (last column of the Table 1). This shows SVR model, with the increase of features, improves the overall performance significantly i.e., 0.419–0.261. However, in case of RF and MLR models, there is no appreciable change in overall performance i.e., 0.395–.418 and 0.523–.516, respectively. This shows overall performance of RF and MLR models is relatively independent on the size of the feature set. On the other hand, with the increase of features, the overall performance of GRNN is degraded from 0.414 to 0.700. This is because more features have increased the data complexity and GRNN model could not effectively correlate all novel compounds as compared to SVR and RF approaches. However, for small features, GRNN model gave interesting results that are comparable to SVR and RF models. Therefore, GRNN models could effectively used in those expert systems where less information is available or to retrieve more information are prohibited. Another advantage of GRNN model is simplicity; it needs single parameter to be optimized. Table 1 also indicates the models performance in terms of other statistics of min, max, and R-value for the training data and the novel data. We summarized that, among the CI models, SVR and RF models are more accurate than GRNN and MLR models. The overall performance order is summarized as:

av gPADSVR < av gPADRF < av gPADGRNN < av gPADMLR < av gPADSPuDS : The prediction models are evaluated for the mixed-oxides perovskites compounds Ba2YSnO5.5, Ba2DySnO5.5, and Ba2HoSnO5.5 (Table 2 at rows 40–42). SVR, RF, GRNN, and MLR models gave sufficient low average values 0.25, 0.32, 0.93, and 0.16, respectively.

3.2. Comparison in terms of linear correlation The performance of prediction models is also compared in terms of linear correlation. Fig. 2A–E shows the performance curve between the experimental and the predicted values for the novel dataset. The equation of linear fit and regression R-value of SVR model is computed to be y0SVR ¼ 0:97y þ 0:27 and 0.979, respectively. Similarly, the equations of linear fit and regression R-value of RF model is shown in Fig. 2B as: y0RF ¼ 0:87y þ 1:1 and 0.974. Similarly, the equations of linear fit and corresponding regression R-values of GRNN and MLR models are shown in Fig. 2C and Fig. 2D, y0GRNN ¼ 0:88y þ 1, 0.958 and y0MLR ¼ 0:93y þ 0:6 0.972, respectively. This figure shows the equation of linear fit and R-values of SVR, RF, GRNN, and MLR models are closer to the ideal linear fit as compared to the SPuDS. Table 1 also indicates the values of regression coefficient R of prediction models. It observed that, for three features data, the value of regression coefficient R is greater than 95% (except SPuDS program). Therefore, SVR, RF, GRNN, and MLR models can successfully be used for the complex cubic perovskites. However, if we are interested in accurate LC prediction, then SPuDS program may not give accurate results. 3.3. Analyzing prediction performance and structural distortion During simulation, from the results in Tables 2 and 3 and Supplementary Tables 1 and 2, we observed, for some compounds, prediction models give relatively high prediction error. It does not matter whether we adopt three or six features for model development. These interesting results motivate us to investigate these compounds with respect to the distortion parameter t. For example, in Table 3, CI models estimate accurately the lattice parameters of Ba2ScNbO6 (t = 1.021) at row 38, Ba2ScTaO6 (t = 1.019) at row 41, Sr2AlTaO6 (t = 1.018) at row 55, and Sr2CrWO6 (t = 0.99) at row 58. This is because these compounds belong to the well-defined cubic structure perovskites. Similarly, in Table 2, at

A. Majid et al. / Computational Materials Science 50 (2011) 1879–1888

1887

Fig. 2. (A–E) shows linear correlation performance curve for novel dataset using three features.

rows 27 and 28, for Ba2PrRuO6 (t = 0.977) and Ba2NdRuO6 (t = 0.979) compounds have relatively less distortion from the cubic structure [37]. CI models predict reasonably low error i.e., order of one. However, for Sr2MnReO6 at row 11, PAD errors are rather higher i.e., order of two. This compound is reported to be cubic structure with lattice parameter a = 8.1865Å and t = 1.0183 [44]. However, in their later work, they reported it as monoclinic (SG P21/n) distorted from the parent cubic structure with parameters a = 5.668 Å, b = 5.645 Å, and c = 7.990 Å [45]. The value of t

(=1.0183) indicates a small distortion from ideal cubic symmetry. Therefore, we recomputed PAD values taking a = 7.990 Å and found significantly low error values 0.458, 0.0138, 0.016, and 0.0313. For another Ba2FeReO6 (t = 1.044) at row 85, all CI models give relatively high PAD 1.715, 1.056, 1.6876, and 1.6304. The value of t (=1.044) indicates the structure of this compound is highly distorted from the cubic symmetry. This is due to the high transitional behaviors of Fe+2.5 and Re+6.5 ions [3]. That is why; we may not compute the exact values of ionic radii from the Shannon Table.

1888

A. Majid et al. / Computational Materials Science 50 (2011) 1879–1888

It observed that for monoclinic compounds, depending upon the level of distortion (t), CI models predict relatively higher PAD error. For example, in Table 3, the higher prediction error for Sr2FeUO6 (t = 0.9783) at row 96 and Ba2NdReO6 (t = 0.9720) at row 35. The PAD errors for Ba2NdReO6 are nearly comparable to Sr2FeUO6. This is due to the similar deviation level (t = 0.97) in these monoclinic compounds. Sr2FeUO6 is reported as belonging to the monoclinic space group (P21/n) with lattice parameters (a = 5.799 Å, b = 5.7819 Å, c = 8.167 Å) [46]. Similar, for Ba2BaUO6 at row 80 in Table 2, CI models give the higher PAD values of 1.0596, 2.4361, 4.8369, and 2.454. This compound is claimed to belong the monoclinic structural family with unit cell parameters (a = 6.301 Å, b = 6.457 Å, c = 8.949 Å) [47]. This large prediction error is due to the heavy distortion factor (t = 0.8791). Further, for Ca2CrWO6 at row 43 in Table 2, the value of t = 0.9445 indicates a high deviation from the ideal cubic symmetry. Its structure is claimed monoclinic with lattice parameters a = 5.39 Å, b = 5.45 Å, a = 7.66 Å [36]. For Ca2CrWO6, CI models give relatively high PAD values of 1.4238, 2.3968, 2.5463, and 1.2247. According to CI models, the pseudocubic lattice parameter of this compound should be near to 7.8. For Ca2MgWO6 (t = 0.9260) at row 92 in Table 2 shows a high PAD errors of 1.964, 3.000, 1.724, 1.817. In the recent literature, the crystal structure confirms the large deviation from cubic symmetry. It is reported as orthorhombic in the space group Pmm2 with parameters a = 7.715 Å, b = 5.413 Å, c = 5.549 Å [48]. Similarly, at row 94 in Table 2, for high distorted U-cation Sr2CrUO6 (t = 0.9392), CI models give high prediction errors of 1.927, 1.973, 1.584, and 2.315. However, at row 96, for low distortion radioactive U-cation Sr2NiUO6 (t = 0.9566) belong to the monoclinic family with lattice parameters a = 5.7809 Å, b = 5.775 Å, and c = 8.156 Å [46], CI models predict successfully the pseudocubic lattice values near to 8.15 Å. Recently, lattice parameters of tetragonal Sr2MnRuO6 are measured to be a = 5.45 Å, b = 5.45 Å, and c = 7.933 Å, [38]. At row 97 in Table 2, for this low distorted tetragonal Sr2MnRuO6 (t = 0.987), we can adopt the pseudocubic lattice parameter as ap = 7.933 Å. CI models predict reasonably low PAD errors of 0.9127, 0.2168, 1.197, and 1.617. In this analysis, we have observed the improved performance of CI models for the prediction of LC. Further, it is observed that CI model show higher prediction error for heavily distorted compounds. In this way, we can also roughly estimate the level of distortion in the structurally known compounds. 4. Conclusion In this study, we have investigated the LC prediction of four different types of CI models and the SPuDS program. It is observed that CI models could effectively approximate the functional dependency of LC on sizes of ionic radii, electro-negativity, and oxidation state. For small feature set, SVR, RF, and GRNN approaches should be preferred over simple linear regression and SPuDs program. Overall, SVR and RF have provided the highest performance. However, for small feature set, simple GRNN model has yielded interesting results that are comparable to SVR and RF models. The main advantage of CI based models is that once an efficient predictive model is built, it can successfully predict the LC of novel compounds of the same type. The performance analysis also provides useful information about the structural distortion of complex compounds from the ideal cubic shape. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.commatsci.2011.01.035.

References [1] J.Y. Guo, Y.W. Zhang, C. Lu, Comput. Mater. Sci. 44 (2008) 174–179. [2] M. Bouville, R. Ahluwalia, Phys. Rev. B: Condens. Matter Mater. Phys. 75 (2007) 054110–054118. [3] D. Serrate, J.M. De Teresa, M.R. Ibarra, J. Phys. Condens. Matter. 19 (2007) 023201–023287. [4] M.T. Sebastian, Dielectric Materials for Wireless Communication, Elsevier, Amsterdam, The Netherlands, 2008. [5] J. Gopalakrishnan, A. Chattopadhyay, S.B. Ogale, T. Venkatesan, R.L. Greene, A.J. Millis, K. Ramesha, B. Hannoyer, G. Marest, Phys. Rev. B: Condens. Matter Mater. Phys. 62 (2000) 9538–9542. [6] M.W. Lufaso, P.M. Woodward, Acta Crystallogr. B: 57 (2001) 725–738. [7] H.Y. Xiao, X.D. Jiang, G. Duan, F. Gao, X.T. Zu, W.J. Weber, Comput. Mater. Sci. 48 (2010) 768–772. [8] J. Tan, G. Ji, X. Chen, L. Zhang, Y. Wen, Comput. Mater. Sci. 48 (2010) 796–801. [9] S.L. Shang, A. Saengdeejing, Z.G. Mei, D.E. Kim, H. Zhang, S. Ganeshan, Y. Wang, Z.K. Liu, Comput. Mater. Sci. 48 (2010) 813–826. [10] S. Ugur, N. ArIkan, F. Soyalp, G. Ugur, Comput. Mater. Sci. 48 (2010) 866–870. [11] C. Motta, M. Giantomassi, M. Cazzaniga, K. Gaál-Nagy, X. Gonze, Comput. Mater. Sci. (2010), doi:10.1016/j.commatsci.2010.09.036. [12] Y. Ouyang, X. Tao, H. Chen, Y. Feng, Y. Du, Y. Liu, Comput. Mater. Sci. 47 (2009) 297–301. [13] K. Bouamama, P. Djemia, K. Daoud, S.M. Chérif, Comput. Mater. Sci. 47 (2009) 308–313. [14] A. Bouhemadou, R. Khenata, M. Kharoubi, T. Seddik, A.H. Reshak, Y. Al-Douri, Comput. Mater. Sci. 45 (2009) 474–479. [15] N. Xing, H. Li, J. Dong, R. Long, C. Zhang, Comput. Mater. Sci. 42 (2008) 600– 605. [16] M. Rafiee, S. Jalali Asadabadi, Comput. Mater. Sci. 47 (2009) 584–592. [17] W. Xue, Y. Yu, Y. Zhao, H. Han, T. Gao, Comput. Mater. Sci. 45 (2009) 1025– 1030. [18] N. Xing, Y. Gong, W. Zhang, J. Dong, H. Li, Comput. Mater. Sci. 45 (2009) 489– 493. [19] P. Wu, Y.Z. Zeng, C.M. Wang, Biomaterials 25 (2004) 1123–1130. [20] F.C. Vallejo, SERC Short Report, Technical University of Denmark, 2008. [21] A. Majid, A. Khan, G. Javed, A.M. Mirza, Comput. Mater. Sci., 10.1016/ j.commatsci.2010.08.028, 2010. [22] S.G. Javed, A. Khan, A. Majid, A.M. Mirza, J. Bashir, Comput. Mater. Sci. 39 (2007) 627–634. [23] B.-T. Chen, T.-P. Chang, J.-Y. Shih, J.-J. Wang, Comput. Mater. Sci. 44 (2009) 913–920. [24] A. Khan, M.H. Shamsi, T.-S. Choi, Comput. Mater. Sci. 45 (2009) 257–265. [25] J.J. Lee, D. Kim, S.K. Chang, C.F.M. Nocete, Comput. Mater. Sci. 44 (2009) 988– 998. [26] I.B. Topçu, M. SarIdemir, Comput. Mater. Sci. 41 (2007) 117–125. [27] I.B. Topçu, M. SarIdemir, Comput. Mater. Sci. 42 (2008) 74–82. [28] H. Farsi, F. Gobal, Comput. Mater. Sci. 39 (2007) 678–683. [29] S. Malinov, W. Sha, J.J. McKeown, Comp. Mater. Sci. 21 (2001) 375–394. [30] A.-h. Cai, X. Xiong, Y. Liu, W.-k. An, J.-y. Tan, Y. Luo, Comput. Mater. Sci. 48 (2010) 109–114. [31] P.W. Barnes, The Ohio State University, 2003. [32] A.K. Azad, Göteborg University, Sweden, 2004. [33] R.D. Shannon, Acta Crystallogr. A 32 (1976) 751–767. [34] URL, (accessed May 2010). [35] J.M. De Teresa, D. Serrate, J. Blasco, M.R. Ibarra, L. Morellon, Phys. Rev. B 69 (2004) (1444) 01–144410. [36] J.B. Philipp, P. Majewski, L. Alff, A. Erb, R. Gross, T. Graf, M.S. Brandt, J. Simon, T. Walther, W. Mader, D. Topwal, D.D. Sarma, Phys. Rev. B: Condens. Matter Mater. Phys. 68 (2003) 144431–144445. [37] W.T. Fu, D.J.W. Ijdo, J. Solid State Chem. 178 (2005) 2363–2367. [38] P.M. Woodward, J. Goldberger, M.W. Stoltzfus, H.W. Eng, R.A. Ricciardo, P.N. Santhosh, P. Karen, A.R. Moodenbaugh, J. Am. Ceram. Soc. 91 (2008) 1796– 1806. [39] A. Smola, B. Schoelkopf, Springer, Netherlands, 14, 2004, pp. 199–222. [40] C.-C. Chang, C.-J. Lin, LIBSVM: A Library for Support Vector Machines. (accessed June 2009). [41] L. Breiman, Mach. Learn. 45 (2001) 5–32. [42] A. Jaiantilal, Randomforest-Matlab. (accessed June 2010). [43] S. Ibric, M. Jovanovic, Z. Djuric, J. Parojcic, L. Solomun, J. Control. Release 82 (2002) 213–222. [44] G. Popov, M. Greenblatt, M. Croft, Phys. Rev. B: Condens. Matter Mater. Phys. 67 (2003) 244061–244069. [45] G. Popov, M.V. Lobanov, E.V. Tsiper, M. Greenblatt, E.a.N. Caspi, A. Borissov, V. Kiryukhin, J.W. Lynn, J. Phys.: Condens. Matter 16 (2004) 135–145. [46] R.M. Pinacca, M.C. Viola, J.C. Pedregosa, M.J. Martinez-Lope, R.E. Carbonio, J.A. Alonso, J. Solid State Chem. 180 (2007) 1582–1589. [47] A. Knyazev, N. Chernorukov, M. Zhizhin, Y. Sazhina, A. Ershova, Radiochemistry 48 (2006) 568–571. [48] F. Lie, B. Yan, J. Optoelectron. Adv. Mater. 10 (2008) 158–163.