Cross-entropy minimization for refinement of Gaussian basis sets

Cross-entropy minimization for refinement of Gaussian basis sets

Volume 166, number 4 CROSS-ENTROPY Shridhar R. GADRE, CHEMICAL PHYSICS LETTERS MINIMIZATION FOR REFINEMENT Sudhir A. KULKARNI 2 March 1990 OF G...

540KB Sizes 0 Downloads 92 Views

Volume 166, number 4

CROSS-ENTROPY Shridhar R. GADRE,

CHEMICAL PHYSICS LETTERS

MINIMIZATION

FOR REFINEMENT

Sudhir A. KULKARNI

2 March 1990

OF GAUSSIAN

BASIS SETS *

and Indira H. SHRIVASTAVA

Department of Chemistry, University of Poona, Poona 41 I 00 7, India

Received 26 July 1989; in final form 4 January 1990

Information theoretic techniques have been applied for the refinement of Gaussian basis sets. A refined distribution has been obtained by cross-entropy minimization starting from a near Hartree-Fock quality density distribution. For this purpose, the Kullback-Leibler cross-entropy, S[&,] =jp(r) In [p(r)/p,(r)] dr, has been minimized subject to exact, theoretical or experimental, second moment constraints in position and momentum spaces. Here, p. is the starting density distribution and p is the corresponding relined one. The procedure has been applied to hydrogen, helium, lithium and beryllium atoms as test eases. Nearly all moments ( - 2 through 4), in coordinate as well as momentum spaces have improved over the original ones at an average worsening of the total energy by a mere 0.04%.

1. Introduction The Hat-tree-Fock (HF) approximation has been used successfully over the last few decades to solve a variety of quantum chemical problems, especially for large atoms and molecules. Moreover, with the advent of powerful computers, the implementation of HF method has been facilitated. However, in order to obtain a highly accurate wavefunction, one has to resort to complicated many-body methods. The drawback of this method is that the simplicity associated with a single Slater determinantal wavefunction is lost. Massa et al. [ l-4 ] have developed a novel method to estimate the “wavefunction” in the Slater determinantal form from the measured X-ray data. The sum 1k[F,,,,(k)-F,ti(k)]2 is minimized where F,,,(k) is the experimental X-ray structure factor and F,,,(k) is the calculated one. The minimization was effected by varying the density matrix elements Plj: (1) subject to the matrix being normalized, Hermitian and idempotent which ensures the wavefunction to * Dedicated to Professor P.T. Narasimhan on his sixty-first birthday.

be a single Slater determinant. This method was tested on the beryllium atom, beryllium metal and the H2 molecule. The electron density arising from these wavefunctions is obviously N-representable, i.e. there does exist at least one antisymmetric wavefunction which gives rise to the given electron density. Massa and Frishberg [ 1,3,4] have shown that this X-ray fitted single determinantal “wavefunction” yields expectation values which are in better agreement with their experimental counterparts. The drawback of this method is that is requires the entire experimental F(k) data as input. Gadre and Gejji [ 51 have proposed a simpler method for the refinement of any three-dimensional probability distribution. Their method may be described as follows: Starting from the original distribution F(x), a refined distribution H(x) is obtained which is very “close” to the original distribution. Let H(x)=F(x) G(x), G(x) being a multiplicative function obtained by minimizing the integral J[G(x) - l]*dx, subject to the constraints of normalization and the second moments, (x2>. The solution G(w) obtained by variational treatment turns out to be [5], G(x)=l+F(x)(I+p*), 1 and p being the Lagrange multipliers. The results obtained by Gadre and Gejji [ 51 for beryllium through neon atoms by application of this procedure were found to be in fair agreement with the corresponding ex-

0009-2614/90/$ 03.50 0 Elsevier Science Publishers B.V. (North-Holland )

445

Volume 166, number 4

CHEMICAL

PHYSICS LETTERS

perimental or best theoretical ones. This simple method can be used to reline any given three-dimensional probability distribution. However, the refined distribution may bc negative [6] in some regions. Gadre and Bendale [ 71 have recently proposed a novel idea of using information entropies to study the quality of the wavefunction for atoms and molecules. In this work, we employ an information theoretical technique for the generation of property oriented Gaussian basis sets. The method employed here is the minimization of cross-entropy to “refine” a given position space density distribution p(r) arising from a given basis set. In section 2, an introduction to information theoretical concepts is presented.

2. Information

entropy

The concept of entropy in probability theory has developed only during the last three decades. Initially, this concept was intimately associated with transmission apparatus of one kind or another [ 8 1. But lately its theoretical significance has been realized and the study of entropy has evolved into an important aspect of the general theory of probability. A scheme is defined as a set of events, and a finite scheme is one in which each and every event has a finite probability of occurrence. Thus, every finite scheme has an amount of uncertainty inherently associated with it [ 81. It was first shown by Shannon [9] that the quantity which is a suitable measure of this uncertainty is of the form S= - &pk lnp, provided that the uncertainty obeys a certain set of axioms. Here, pk is the probability of the kth event in the given finite scheme. This quantity S is known as Shannon’s entropy. Shannon also proved the universality of this entropy functional, which inspired Jaynes to postulate his maximum entropy principle (MEP) [IO]. Jaynes’ [ 11 ] MEP states that if P= (p,, p2, . ... p,) is the probability distribution of an experiment with data D= (d,, d,, .. .. d,,), then the P which best describes D is the one which maximizes - I&$ In pk with respect to all P satisfying D, viz. obeying the given contraints. The MEP has been successfully applied to a wide variety of problems in science [ 12 1. 446

2 March 1990

One problem associated with Shannon’s entropy functional is that it is not invariant to scaling, i.e. the transition from discrete to continuous probabilities -

ck

pk

10) lnp(x) d-~

lnPk+-

(2)

is not smooth. This implies that the numerical value of the entropy depends on the coordinate system used. Bialynicki-Birula and Mycielski [ 13 ] proposed an uncertainty relation in quantum mechanics in terms of information entropies

(where 1‘V(r) I 2 and I O(p) / 2 are probability densities in n-dimensional coordinate and momentum spaces, respectively ), which was reformulated by Gadre et al. [ 14,151 in terms of single-particle densities as S[p]+S[y]>3N(l+lnrr)-2NlnN, S[p] and S[y] being Shannon’s entropies in coordinate and momentum space respectively. Cadre and Bendale [ I6 ] derived an upper bound to the information entropies in terms of the second moments in the conjugate spaces. They also formulated a direct relationship between the quantum mechanical kinetic energy and information entropy in position space. The above discussion brings out the use of Shannon’s entropy to quantum mechanical density distribution. There exists yet another measure of information, viz. the Kullback-Leibler information [ 17 1, or discrimination information or cross-entropy, defined as S[P, hoI=

C nr

(3)

lnhlpO.j)

for discrete cases, and

S[P, l&l = j P,(x) ln[P,(x)lMx)l

d+~

(4)

for continuous ones (PI(x) and P,(x) are normalized probability density functions). This cross-entropy is invariant to coordinate transformation. For a given experiment, the best way to get maximum information in a totally unbiased way is by the employing Jaynes’ principle of entropy maximiza-

Volume 166,number 4

CHEMICALPHYSICSLETTERS

tion or entropy deficiency 6

(1

minimization,

p(x) In[P(x)lPdx)l

h+

only when the refined distribution is identical to the starting distribution, that is when

i$o~,C~r(x))

=o.

> (5)

The li are Lagrange multipliers and (F,(x) > are the constraints to be satisfied by the density distribution P(x). The solution to the above equation is p(x)=p&)

exp

i

5 A, (F,(x)) i=l

.

(6)

>

Eq. (6) can be employed to rejne a given density distribution P,-,(x). However there are some limitations of this approach, for example if P(x) is the positional electron density, then the use of kinetic energy as a constraint is impossible. Hence it is necessary to devise a method wherein both the position as well as momentum space contraints may be employed to refine a given wavefunction. Minimization of the cross-entropy S[ P 1PO] naturally incorporates the grassroot ideas of the least-squares procedure as well. In this procedure, we try to find a distribution P(x) which is as close as possible to the original distribution PO(x), and which also obeys the constraints. In the present work, the constraints employed are those of the second moments (r2) and (p’ ) . We choose the wavefunction to be a Slater determinant, thus ensuring antisymmetry. The starting wavefunction is in terms of an orbital basis. The basis sets employed as test cases were those for hydrogen, helium, lithium and beryllium atoms. The procedure, along with the results obtained, is explained in detail in section 3.

3. Results and discussion As noted in section 2, the minimization entropy

Ip(r) lnMr)lpdr)l

S[PIPOI=

dr

2 March 1990

of cross-

(7)

is used in the present work for the refinement of a near Hartree-Fock wavefunction. Here, PO(r) is the starting electron density and p(r) is the refined one. It may be noted that the cross-entropy is always nonnegative, viz. S[p(po] 30, where the equality holds

p(r)

=p0(r)

.

The refined density was obtained by constraining the wavefunction to the exact/Hartree-Fock or experimentally observed ( r2) and (p2> expectation values. It is anticipated that the (r*> expectation value refines the tail of the Gaussian wavefunction whereas the (a’> tailors the wavefunction near the nucleus The parameters, viz. the linear expansion coefficient and the exponents, were optimized subject to orthonormality of the atomic orbitals, ensuring that the resulting wavefunction is still a single Slater determinant. The starting wavefunctions were van Duijneveldt’s 3G [ 181 and Stewart’s 3G and 4G [ 191 basis sets for the hydrogen atom. For He, Huzinaga’s 3G and 4G [20] #* basis set were used as test cases. For Li and Be, Huzinaga’s [ 201 9G and 10G basis set, respectively, were employed. The parameters were varied to within + 15% of their original values and the optimization carried out by a general optimization routine (subroutine STEPIT [ 2 1 ] ). For the hydrogen atom the exact ( r2> and (p’ ) constraints were used. For He, Li and Be the constraints used were those due to Robertson et al. [ 22 1, Banyard et al. [23] and Esquivel and Bunge [ 241, respectively, which are either theoretically or experimentally obtained expectation values. From the optimized set of parameters obtained by minimization of the cross-entropy, (r”} and (p”} expectation values were calculated (where n= - 2, - 1, 1, 2, 3, 4). Table 1 compares the (r”) and (p”) expectation values computed from the refined distribution and starting distributions for the hydrogen atom. It is evident from table 1 that all the reported (r”) and (p”) expectation values are in better agreement with the exact ones. The percentage improvement in the expectation values of ( rm2), ( r3), (p’) and (p”) is particularly remarkable for all the three basis sets considered here. The betterment in all other properties is typically 1%. The results for van Duijneveldt’s 3G basis are even more impressive, with “’ It was observed that the IOG basis orbitals for lithium as given by the author of ref. [ 201 are not orthonormalized. 447

Volume 166, number 4 Table I Expectation Expectation values

CHEMICAL

PHYSICS LETTERS

2 March 1990

values for H atom 3G and 4G basis sets (all values in au) Stewart a) 3G

Refined b, Stewart 3G

Van Duijneveldt 3G

‘)

Refined b, van Duijneveldt

3G

Stewart ‘) 4G

Refined b, Stewart 4G

Exact values


1.838 0.989 0.207 1.500 2.996 22.06 7.453

1.876 0.995 0.216 1.500 3.000 22.14 7.473

I.903 0.994 0.235 1.491 2.922 19.44 6.992

1.954 0.997 0.254 1.506 3.000 20.73 7.314

1.927 0.997 0.243 1.500 2.999 22.41 7.491

1.940 0.998 0.246 1.500 3.000 22.44 7.498

2.0 1.0 0.318 1.5 3.0 22.5 7.5

(P-9

4.886 1.688 0.764 0.848 0.989 1.577 3.416 -0.494907

4.892 1.689 0.765 0.850 1.000 1.629 3.645 -0.494754

4.554 1.649 0.665 0.853 0.994 1.621 3.837 -0.496979

4.685 1.670 0.695 0.849 1.000 1.697 4.325 -0.496547

4.961 1.695 0.792 0.849 0.997 1.645 3.953 -0.498482

4.963 1.695 0.793 0.849 1.000 1.662 4.043 -0.498487

5.0 1.697 0.811 0.849 I.0 1.698 5.0 -0.5

w’>

Y(0) (P>

energy

a) Values computed cl Values computed

from Stewart’s [ 19 ] 3G and 4G basis sets. from van Duijneveldt’s [ 18 ] 3G basis set.

b, Values computed

nearly 5% improvement in almost all the properties. In the case of (p4), an enhancement of 12% results in a good estimate of this expectation value. Also p(O) and y(O), that is, the electron density in coordinate space at the nucleus and in momentum space at p=O respectively, are better by almost 5% in the case of Stewart’s 3G basis and about 8% in case of van Duijneveldt’s 3G basis. This implies that the entire distribution has improved and hence all the one-electron expectation values are in better agreement with the exact values compared to starting ones. The loss in energy for refined Stewart’s and van Duijneveldt’s 3G basis is about 0.03% and O.O7O/o, respectively. In the case of Stewart’s 4G basis the situation is cvcn more interesting since there is no loss in energy as per the above noted trends. In fact the energy is lowered here by 0.001%. Though the lowering in energy is very small, it is remarkable because the purpose of present work is to get a property oriented wavefunction rather than an energy-minimized one, which is obtained in this case as a bonus. In table 2, the comparison of the refined distribution with Huzinaga’s [ 201 3G and 4G basis set for He is displayed. The (r”} and (p”) expectation values show a similar trend as in the case of the H atom. For the 3G basis for He, some of the prop448

in the present work.

erties such as (r”}, (p-l), (p3) and (p”} of the refined wavefunction are even comparable to the 4G basis of Huzinaga. In fact some are even better than the Huzinaga 4G basis! All other properties show better agreement of the order of l-4% in both the bases in the case of the He atom. The improvement inp(0)andy(0)is12%,10%and8%,3%forthc3G and 4G basis respectively, which clearly indicates an improvement in the whole distribution. The loss in energy is typically 0.2% and 0.07% for fhe 3G and 4G basis, respectively. Table 3 shows a similar comparison of expectation values computed from Huzinaga’s [ 201 9G basis set with the refined one for the lithium atom. The comparison in this case becomes important due to the inclusion of the 2s orbital leading to an additional orthonormality constraint. As shown in table 3, all (r”) and (p”} properties except (r-‘) are in better agreement to correlated ones as compared to Huzinaga’s wavefunction. The improvement is remarkable in the case of ( r3) and ( r4> expectation values. Here though the (r-‘> value seems to be slightly overestimated, it is still better than what Huzinaga’s wavefunction yields. The p( 0) shows a similar trend as in the case of H and He atoms. The loss in energy for this case is 0.01%. In table 4 a comparison of refined distribution with

Volume 166, number 4 Table 2 Expectation

CHEMICAL

2 March 1990

values for He atom 3G and 4G basis Sets (all values in au)

Expectation values

Huzinaga 3G

(r-‘> (r3) (r4> Y(0) (P> (P2> (P3> (Pa> energy

11.27 3.350 2.526 1.833 2.256 3.424 6.05 1 3.551 2.047 0.327 2.819 5.671 16.85 75.44 -2.836

a) Values computed

Table 3 Expectation

PHYSICS LETTERS

from Huzinaga’s

‘)

Refined b, 3G

Huzinaga 4G

11.71 3.351 2.843 1.871 2.370 3.711 6.764 3.736 2.095 0.354 2.789 5.723 lg.29 92.90 -2.828

11.72 3.369 2.946 I.847 2.325 3.682 6.935 3.808 2.097 0.374 2.809 5.710 17.58 87.95 -2.855

[ 201 3G and 4G basis sets.

Huzinaga 9G

(r-*> (r-I> P(O) (0 (r’> :r4: r3 W> Y(0) (P> (PZ> (P3>

a)

Refined b1 9G

Correlated

30.146 5.715 13.039 5.018 18.601 559.9 94.27

30.275 5.734 13.213 4.985 18.372 548.78 92.71

30.246 5.718 13.867 4.994 18.372 545.8 92.44

26.12 5.166 8.103 4.905 14.862 70.84 592.4 2.281 -7.4323

25.89 5.139 8.137 4.929 14.956 71.27 606.6 1 2.293 -7.4315

I

b, Values computed

Refined b, 4G 11.97 3:30 3.225 1.861 2.370 3.798 7.233 3.882 2.116 0.385 2.793 5.723 18.37 103.2 -2.8529

in the present work.

compared to Huzinaga’s energy is only 0.027%.

values for Li atom 9G basis set (all values in au)

Expectation values

‘)

NHF ‘1

11.99 3.374 3.595 1.854 2.370 3.882 7.776 4.092 2.140 0.440 2.798 5.723 17.99 105.6 -2.867

I

Cl See ref.

[ 221.

estimate. The loss in total

c,

4. Concluding remarks

(llrtj) energy

” Values computed b, Values computed ‘) See ref. [ 1 1.

5.149

14.956

I

2.199 - 7.4780 d,

from Huzinaga’s 1201 9G basis set. in the present work. ‘) See ref. [ 23 1.

Huzinaga’s [ 201 1OC basis set for the beryllium atom is made. The Be atom gives a better estimate for all the properties. The improvement reflected in ( r3), (P) and (r’) expectation values is 9%, 13% and 17%, respectively. In this case, though the (r-l} property seems to be overestimated, it is still better

The HF method is the best known method for obtaining an energy-minimized wavefunction in the form of a single Slater determinant. However, as noted by Frishberg [ 41, too much emphasis has been laid on energy. The properties of energy minimized NHF-quality wavefunctions may be found to be quite different from the exact or correlated properties. Thus, it is worthwhile to obtain property-oriented wavefunctions, which has been attempted in the present work. For obtaining such a property-oriented wavefunction, the concept of minimization of cross-entropy has been used which incorporates the idea of a leastsquares method in the sense that refined distribution, subject to the prescribed constraints, viz. ( r2) and (JJ*) values, is not much off from the original distribution. These two constraints are used because they are expected to tailor the wavefunction in the tail and nuclear regions, respectively. That the imposition of these two constraints can lead to a betterment in ( r”) and (p”} expectation values is not to be expected at the outset. However, in the present 449

Vohune 166,number 4

CHEMICAL PHYSICS LETTERS

2 March 1990

Table 4 Expectation values for Be atom 10G basis set (all values in au ) Expectation values

Huzinaga ‘) 10G

Frishberg b,

Refined ‘) 10G

Correlated d,

<+> (r-l)

57.56 8.408 34.082 6.126 17.280 62.76 264.6 1283.7 24.86 6.291 5.7067 7.435 29.144 185.4 2091.5 4.489 - 14.572

57.63 8.425

57.55 8.447 33.984 5.973 16.28 56.85 231.1 1061.2 23.15 6.096 5.099 7.510 29.33 185.3 2114.4 4.567 - 14.568

57.59 8.425

P(O)

(r) (r? (P-l>

Y(0) 0) (P3>

< 1 lrij) energy

5.988 16.280 231.6 1065.5

29.19 2165.5 4.566 - 14.569

5.978 16.280 56.95 233.1 1085.96

7.529” 29.33 186.24 =’ 2029.10 cl 4.380 - 14.667

a1 Values computed from the Huzinaga [ 201 IOG basis set. b1See ref. [4]. c1Values computed in the present work. d1See ref. [ 241. e1Values computed by integrating appropriately weighted difference of correlated and NHF Compton profiles [25]. The value of this integral was added to the value of (P”)NHF. However, (p“} estimated this way is rather unreliable due to the inadequacy of the available J(q) data in the high-q region.

study it was observed that nearly all the one-electron properties have improved. In fact, in some cases, there is a fortuitous bonus of a lower electronic energy as well. It may be pointed out that the naive expectation that “the lower the energy, the better the expectation values” is not borne out. For example, van Duijneveldt’s 3G basis set for the H atom is energetically an excellent one. However, many of the one-electron expectation values are rather poorly estimated (see table 1). Our basis sets are more “balanced” ones for the energy and other one-electron properties. The wavefunction employed in the present work is still a single Slater determinant. In this sense, the above method is similar in spirit to the work done by Firshberg [4] and Massa [ 1,3]. The refinement procedure suggested by Gadre and Gejji [ 5 ] had one drawback, viz. the relined distribution p=po+ $pi (L+p’) could go negative. The present information-theoretic procedure does not suffer from this drawback, and is expected to prove successful in refining basis sets for p-block atoms as well. It is hoped that molecular calculations performed using these refined atomic basis sets should yield better es-

450

timates of one-electron properties presently used basis sets.

as compared

to

Acknowledgement Financial assistance from the Department of Science and Technology (under Grant No. SP/S 1/JOO/ 85) and University Grants Commission (Under GrantNo. F. 12-43/86 (SRIII)),intheformofResearch grants to SRG is gratefully acknowledged. Both IHS and SAK thank the University Grants Commission for partial assistance.

References [ 1] C. Frishberg and L. Massa, Phys. Rev. B 24 ( 1981) 7018. [ 21 R.F. Boehme and S.J. La Placa, Phys. Rev. Letters 59 ( 1987) 985. 131 L.J. Mama, M.J. Goldberg, C. Frishberg, R.F. Boehme and S.J. La Placa, Phys. Rev. Letters 55 (1985) 622. [ 41 C. Frishberg, Intern, J. Quantum Chem. 30 (1986) I.

Volume 166, number 4

CHEMICAL PHYSICS LETTERS

[5]S.R.GadreandS.P. Gejji, J. Chem. Phys. 80 (1984) 1175; Chem. Phys. Letters 109 (1984) 584. [6] M.L. Cohen, private communication to SRG (November 1987). [7] S.R. Gadre and R.D. Bendale, Current Sci. 54 (1985) 970. [ 81A.I. Khinchin, Mathematical foundation of information theory (Dover, New York, 1957). [9] C.E. Shannon, Bell Syst. Tech. J. 27 (1948) 379,623. [lo] E.T. Jaynes, in: Maximum entropy and Bayesian methods in inverse problem, eds. C. Ray and W.T. Grandy (Reidel, Dordrecht, 1985). [ 111 A. Hobson, Concepts in statistical mechanics (Gordon and Breach, London, 1971). [ 12 ] E.T. Jaynes; in: The maximum entropy formalism, eds. R.D. Levine and M. Tribus (MIT Press, Cambridge, 1980). [ 13 ] I. Bialynicki-Birnla and J. Mycielski, Common. Math. Phys. 44 (1975) 129.

2 March I990

[ 141 S.R. Gadre, Phys. Rev. A 30 (1984) 620. [ 15] S.R. Gadre, S.B. Sears, S.J. Chakravorty and R.D. Bendale, Phys. Rev. A 32 (1985) 2602. [ 161 S.R. Gadre and R.D. Bendale, Phys. Rev. A 36 ( 1987) 1932. [ I7 ] S. Kullback and R.A. Leibler, Ann. Math. Stat. 22 ( 1951) 79. [ 181 F.B. van Duijneveldt, IBM Technical report, RJ 945 ( 197I ). [ 191 R.F. Stewart, J. Chem. Phys. 52 (1970) 431. [ 201 S. Huzinaga, J. Chem. Phys. 42 ( 1965) 1293. [ 21 ] J.P. Chandler, Subroutine STEPIT, QCPE, program No. 66. [ 22 ] D. Robertson, V.H. Smith Jr., A.M. Simas and A.J. Thakkar, Intern. J. Quantum Chem. 30 ( 1986) 7 17. [ 23 ] K.E. Banyard, M. Dixon and A.D. Tait, Phys. Rev. A 4 (1971) 2199. [24] R.O. Esquivel and A.V. Bunge, Intern. J. Quantum Chem. 32 (1987) 295. [25] R. Benesch and V.H. Smith Jr., Chem. Phys. Letters 5 (1970) 601.

451