Algorithm comparisons for Stark-profile calculations

High Energy Density Physics 7 (2011) 391e399 Contents lists available at SciVerse ScienceDirect High Energy Density Physics journal homepage: www.el...

Download PDF

285KB Sizes 3 Downloads 141 Views

Report

PDF Reader
Full Text

High Energy Density Physics 7 (2011) 391e399

Contents lists available at SciVerse ScienceDirect

High Energy Density Physics journal homepage: www.elsevier.com/locate/hedp

Algorithm comparisons for Stark-proﬁle calculations Carlos A. Iglesias*, Vijay Sonnad Lawrence Livermore National Laboratories, P.O. Box 808, Livermore, CA 94550, USA

a r t i c l e i n f o

a b s t r a c t

Article history: Received 25 July 2011 Received in revised form 1 September 2011 Accepted 1 September 2011 Available online 10 September 2011

The efﬁciency of several algorithms to calculate Stark broadened line shapes in the quasi-static ion approximation is compared. The algorithms can be grouped into three general approaches: simultaneous equation solvers, matrix decompositions, and model-reduction. It is emphasized that the tested algorithms do not rely on approximations beyond the quasi-static ion assumption. The comparisons show that model-reduction schemes are the most efﬁcient and are more than 2 orders of magnitude faster than the conventional method for large-scale calculations. Consequently, complex line shape calculations become practical without the compromises often required in the past. Ó 2011 Elsevier B.V. All rights reserved.

Keywords: Stark broadening Line shapes

1. Introduction In standard Stark broadening theory of spectral lines, the perturbing ions are assumed to have negligible motion during the average lifetime of the radiating states [1e3]. Thus, the line shape calculation averages the Stark ﬁeld induced state mixing and energy shifts over the probability distribution for the quasi-static ﬁeld [2e4]. In the absence of external ﬁelds, the line shape can be written in the form [1e3],

ZN IðuÞ ¼

d3 Pð3 ÞJðu; 3 Þ

(1)

0

where Zu is the photon energy, P(3 ) is the probability of ﬁnding a Stark ﬁeld of magnitude 3 at the radiator, and J(u;3 ) represents the line shape due to radiatoreelectron interactions in the presence of this ﬁeld. The numerical evaluation of the line shape replaces the integral in Eq. (1) with the sum N X 3

IðuÞ ¼

wi Pð3 i ÞJðu; 3 i Þ

(2)

i¼1

where N3 is the number of ﬁeld points and wi are weights appropriate to the integration scheme. The calculation of Eq. (2) poses a challenge. To obtain accurate results the ﬁeld mesh must resolve * Corresponding author. E-mail address: [email protected] (C.A. Iglesias). 1574-1818/$ e see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.hedp.2011.09.001

features in J(u;3 ). Since the structure of J(u;3 ) is usually not known a priori, a large number of ﬁeld points or an adaptive quadrature scheme is often necessary. Conventional methods, however, use an explicit matrix inversion at each frequency and ﬁeld value [5e7], which is computationally expensive for large matrices, scaling as N3 operations for a matrix of size N [8]. Consequently, there is a premium for using many ﬁeld points. Often the practical resolution of these contending needs is the introduction of approximations to accelerate the calculations [9]. If such compromises are made in large-scale line shape calculations, the complexity of the spectrum may make it difﬁcult to identify possible errors [10,11] except by comparisons to more complete calculations. Recently, reliable and efﬁcient methods were developed to compute the line proﬁles [12,13]. Although the operation count for these methods indicates signiﬁcant reduction in computational effort compared to the conventional approach, this is not sufﬁcient to determine execution time. That is, machine-speciﬁc optimized computer routines from the basic linear algebra subprogram libraries can be considerably faster than unoptimized operations and it is necessary to compare actual implementations to establish execution times and speed up. The purpose here is to determine the efﬁciency of various algorithms, in addition to those previously proposed [12,13], in large-scale quasi-static ion line shape calculations relative to the conventional approach. The intent is to identify fast, accurate, and numerically robust methods to compute Stark-proﬁles. It is emphasized that these methods do not rely on approximations beyond those present in the standard line broadening theory. The discussion begins in Section 2 with a brief review of Stark line broadening theory. The various algorithms are discussed in

392

C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399

Section 3 through 6 followed by comparisons in Section 7 and conclusions in Section 8.

N X 3

wi Pð3 i ÞIm½gð3 i Þ

(11)

i¼1

2. Stark broadening theory

where the complex function of the Stark ﬁeld has the form

The ﬁeld-dependent line shape function in Eq. (1) can be written in the form [1,2] /

/

Jðu; 3 Þ ¼ p1 ImTrf d $Rðu; 3 Þr d g

(3)

where Tr denotes a trace over the radiator internal states with r and /

d the radiator density matrix and dipole operators, respectively. The resolvent operator R is from standard theory where the electrons are treated in the binary collision approximation and is deﬁned by its inverse

R1 ðu; 3 Þ ¼ AðuÞ 3 B

(4)

where the ﬁeld independent contribution is given by

AðuÞ ¼ Du FðuÞ;

(5)

DuY ¼ u ½H ; Y;

(6)

and the coefﬁcient to the quasi-static ﬁeld is

(7)

Here, Y is an arbitrary operator in the radiator subspace, [.] denotes a commutator, H is the Hamiltonian for the internal states of the radiator, and the Stark ﬁeld deﬁnes the z-axis so dz is the zcomponent of the dipole operator . The “width and shift” operator, F(u), contains the plasma electron contribution to the relaxation processes of the radiator states [1,2,5e7,9]. Performing the trace in Eq. (3) yields,

XXX q

0 0

ab a b

ðqÞ

ðqÞ

dab Rab;a0 b0 ðu; 3 Þda0 b0 ra0

(8)

with the internal radiator states deﬁned as jai ¼ jGa ja ma i; jbi ¼ jGb jb mb i, and so on, where jGjmi has total angular momentum j, magnetic quantum number m, and G represents any additional quantum numbers necessary to complete the description. The elements of the qth spherical component of the radiator dipole operator are given by [14] ðqÞ

dab ¼ hajdðqÞ jbi ¼ ð1Þja ma

ja ma

jb mb

1 1=2 S q ab

(9)

hajRd

jb i ¼

X 0 0

ab

ðqÞ

Rab;a0 b0 da0 b0

with superscript T the transpose and [ $r the vector dot product. After calculating the elements of the matrices A and B as well as the vectors [ ¼ dðqÞ and r ¼ rd(q), the goal is to evaluate g(3 ) using robust algorithms without uncontrolled approximations at the Stark ﬁeld values necessary to perform an accurate integration over the Stark ﬁeld. In the conventional approach the calculation of g(3 ) in Eq. (12) proceeds with an explicit matrix inversion of {A3 B} at each ﬁeld value [5e7]. As it turns out, the conventional method has the highest operation count of the algorithms considered in the present work and is the least efﬁcient method for quasi-static ion line shapes calculations. The matrix {A3 B} is complex and symmetric so the present calculations use the LAPACK [15] routine ZSYTRF for the LDLT factorization and ZSYTRI for the inversion.

It is not necessary to perform the explicit matrix inversion. In fact, explicit inversion requires w3 times more computational effort and is less stable than solving a system of equations [8]. That is, g(3 ) can be expressed in the form

gð3 Þ ¼ [T $zð3 Þ

(13)

where the vector z(3 ) satisﬁes the simultaneous equations

fA 3 Bgz ¼ r

(14)

Several algorithms for solving the matrix system in Eq. (14) are considered.

4.1. Direct complex symmetric solver This method solves for z(3 ) using complex arithmetic. The approach does a factorization of {A3 B} followed by a forward and backward substitution solver [8]. The calculations use the LAPACK routine ZSYTRF for the factorization and ZSYTRS for the solver.

4.2. Direct real symmetric solver

in terms of the Wigner 3-j symbol and the reduced dipole matrix element of the Wigner-Eckart theorem. The density matrix is assumed diagonal with elements ra the occupation of state jai. The tetradic notation was introduced [5,6] ðqÞ

(12) T

4. Simultaneous linear equations

BY ¼ h1 ½dz ; Y:

Jðu; 3 Þ ¼ p1 Im

gð3 Þ ¼ [T $fA 3 Bg1 r

(10) ðqÞ

Thus, in ‘line space’ Rab,a0 b0 and dab are elements of a matrix and a vector, respectively. 3. Conventional calculation method It follows from Eqs. (2) and (8) that the calculation of the line shape function in Eq. (1) at a given frequency can be viewed as the following problem: Perform the numerical sum for each relevant diagonal block of the resolvent R,

The problem can be solved using real arithmetic by writing Eq. (14) in the form

A1 3 B A2

A2 A1 þ 3 B

x y

¼

r 0

(15)

where the real and imaginary parts are deﬁned by

A ¼ A1 þ iA2 z ¼ x þ iy

(16)

Even though the problem in Eq. (15) involves a real symmetric matrix, this scheme is inefﬁcient in both storage and operation count relative to the complex symmetric solver and is not considered any further. Nevertheless, the method may be valuable for computers with poor complex arithmetic performance.

C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399

393

5.1. Eigendecomposition

4.3. Direct sparse complex symmetric solver Typically the matrix {A3 B} is not dense and there exist computationally efﬁcient factorization schemes designed for sparse matrices. The present calculations use direct sparse solver routines available for symmetric, complex matrices in the IntelÒ Math Kernel Library.

The eigendecomposition of the C matrix [8],

CQL ¼ QL L

(23)

leads to [12]

fI 3 Lgz ¼ u

4.4. Iterative solutions

(24)

l0 s

Iterative methods for the solution of linear equations are an important alternative to direct matrix factorization approaches. Under certain conditions, (high sparsity and efﬁcient preconditioning), iterative techniques can be signiﬁcantly faster than direct approaches. The quasi-minimal residual (QMR) method has been shown to be an efﬁcient and relatively robust approach for complex symmetric matrices [16,17]. A consideration is the quality of the pre-conditioner. In the absence of physically motivated choices, then incomplete factorizations are applicable across a wide range of problems. The present calculations use the QMR routines with LLT pre-conditioners [18].

where L ¼ diag {l1,l2,.} with the eigenvalues and the columns of QL the eigenvectors of C. Since C is non-Hermitian, QL is not unitary and the l0 s are complex numbers. The line shape is now easily computed since L is a diagonal matrix. As was stressed previously [12] the eigendecomposition here is distinct from earlier eigendecompositions in line broadening theory. The calculation calls the LAPACK routines ZGEEV for the eigendecomposition of C, ZSYTRF and ZSYTRS for the factorization and solver for A, plus ZGETFR and ZGETFS for the LU factorization and solver for QL. The eigendecomposition approach suffers, in principle, from a large overhead in operation count (w27N3 operations for a non-Hermitian complex matrix of size N) and can be numerically unstable [8].

5. Matrix decomposition 5.2. Hessenberg decomposition The considered matrix decomposition approaches in effect separate the computationally intensive matrix algebra from the ﬁeld integration. The ﬁrst step is to rewrite Eq. (12) as [12]

A robust and by operation count more efﬁcient approach uses the Hessenberg decomposition [8] of the C matrix

gð3 Þ ¼ [T $fI 3 Cg1 A1 r

CQH ¼ QH H

(17)

leading to [12]

where I is the identity matrix and

AC ¼ B

(18) 1

The manipulations above require that A exist, which is always the case since it describes the line shape in the absence of a quasistatic ﬁeld. To proceed, formally perform the decomposition of the C matrix,

CQM ¼ QM M

(19)

Substituting Eq. (19) into Eq. (17) yields 1 A1 r gð3 Þ ¼ [T QM $fI 3 Mg1 QM T ¼ [ QM $zð3 Þ

(20)

fI 3 Mgz ¼ u

½I 3 Hzð3 Þ ¼ u

(26)

Here H is a Hessenberg matrix (tridiagonal plus upper triangular) y y and QH is unitary; that is, QH QH ¼ QH QH ¼ I where the superscript y denotes Hermitian conjugate. The Hessenberg decomposition requires a factor of w5 less operations than the eigendecomposition, but to solve the simultaneous equations in Eq. (26) requires O(N3 N2) operations rather than O(N3 N) in Eq. (24). The calculation calls the LAPACK routines ZGEBAL, ZGEHRD, ZUNGHR, and ZGEBAK.

5.3. Schur decomposition The Schur decomposition [8] leads to

where

(21)

and u, which is ﬁeld independent, is deﬁned by

AQM u ¼ r

(25)

CQS ¼ QS S

(27)

and

(22)

Again, explicit matrix inversion is not required, rather Eq. (22) is solved using the factorization of the A and QM matrices. Note that even though the resolvent R is already block diagonal, the A matrix can be further reduced to at least 2 diagonal blocks (see Appendix A). The matrix decomposition in Eq. (19) undoubtedly has an operation count higher than the factorization of {A3 B}, however, the decomposition is performed only once. Hence, depending on the structure of the M matrix, it is amortized by a fast solution of Eq. (21) performed N3 times. This can lead to an overall reduction in computational effort.

½I 3 Szð3 Þ ¼ u

(28)

where S is an upper triangular matrix and QS is unitary. The main feature of this decomposition is that the diagonal elements of S contain the eigenvalues of C, which could be used to optimize the ﬁeld integration [12]. The operation count of the Schur decomposition is w3 times larger than for the Hessenberg decomposition; hence, it was not implemented in the present work. An alternative way to obtain the eigenvalues is to use the LAPACK routine ZHSEQR to compute eigenvalues of the Hessenberg matrix H. This last step, however, doubles the operation count in the Hessenberg method and was also not implemented.

394

C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399

6. Model-reduction The idea of model-reduction schemes is to replace g(3 ) with an approximate function that involves smaller matrices than the original problem but reproduces a certain number of Taylor coefﬁcients. Consider the ﬁxed expansion point 3 ¼ 3 o to write the Taylor series,

gð3 Þ ¼ [T $fðA 3 o BÞ ð3 3 o ÞBg1 r ¼ [T $fI ð3 3 o ÞCo g1 b P mn ð3 3 o Þn ¼

(29)

n

where

Ao ¼ A 3 o B

(30)

Ao b ¼ r

(31)

Ao Co ¼ B

(32)

and

mn ¼ [T $Con b

(33)

is the nth Taylor coefﬁcient. As an example, consider a rational function approximation to g(3 ). That is, the Taylor series may not be a useful approximation, but a rational function that reproduces the Taylor coefﬁcients can have a considerably larger range of validity. By equating powers of the expansion variable it is possible to obtain simultaneous equations for the rational function coefﬁcients. It is natural to expect the accuracy of the approximant to improve with increasing order of the approximation. This, however, is not the case for this procedure. Instead, the answer stagnates after matching only up to a small number of Taylor coefﬁcients [19]. Fortunately, there are techniques to obtain model-reduction that avoid the stagnation problem. 6.1. Lanczos process The Lanczos method can be considered as an iterative approach for generating a pair of bi-Hermitian conjugate spaces to reduce a general square matrix to a tridiagonal matrix of smaller size. After k Lanczos iterations the process leads to the expression [13,19]

h ih i gð3 Þz [T $b eT1 $zð3 Þ

(34)

where

½I ð3 3 o ÞTk z ¼ e1

(35) T

Here, Tk is a tridiagonal matrix of size k, and e1 ¼ [1,0,.,0] is a unit vector of length k. Performing k iterations in the Lanczos process yields an approximation to g(3 ) that reproduces the ﬁrst 2k 1 coefﬁcients in a Taylor series expansion about the reference point 3 ¼ 3 o. In an average sense, additional iterations increase the range of validity relegating errors to regions further away from the reference point until it reaches its best value when the number of iterations is equal to the size of the matrix C. In the line proﬁle calculation the effective range of integration over the Stark ﬁeld is deﬁned by P(3 ). This feature allows for a physically motivated approach to determine convergence of the Lanczos iterations. That is, the integration over the Stark ﬁeld provides a straightforward convergence criterion [13]. Choosing 3 o

near the peak of the integrand in Eq. (11) diminishes the impact of these errors and minimizes the number of Lanczos iterations. The present calculations start with k ¼ 0.02N with N the size of the R diagonal block. The convergence criterion increments the number of Lanczos iterations by Dk ¼ 0.01N until the desired convergence is achieved, which is assumed when the relative difference in the integration over the Stark ﬁeld for the last 3 sets of iterations is less than 103. Although it is possible to choose a smaller initial number of iterations with possible timesaving, care is then required since for a relatively small number of iterations the results can be highly inaccurate with positive and negative contributions and the adaptive integration routine may fail to converge. The number of iterations depends on the choice of reference ﬁeld. In the calculations 3 o is chosen at the maximum value of the integrand from results of the same diagonal block of R at nearby frequencies. If no nearby results are available, the peak value location of P(3 ) is used. For calculations assuming a dense Ao matrix, the routines ZSYTRF and ZSYTRS can be used to compute the Co matrix and generate the Lanczos vectors. Here, the sparse matrix factorization and solver routines from the IntelÒ Math Kernel Libraries are used. The ZGTSV routine is called to solve the tridiagonal system. Since in either case the time is dominated by matrix factorization and matrix-vector multiplication, the robust Modiﬁed Gram Schmidt procedure is applied at each iteration (the equivalent of full orthogonalization for complex non-symmetric matrices). It is emphasized that even with orthogonalization the classical Lanczos algorithm can be numerically unstable. Although this problem can be efﬁciently remedied by robust approaches [20], numerical instability was not a problem in the present calculations and the more robust algorithm was not implemented. 6.2. Arnoldi process The Arnoldi method is similar to the Lanczos algorithm in that it creates an orthonormal basis for a Krylov subspace to reduce a general square matrix to a Hessenberg matrix of smaller size. Performing k Arnoldi iterations yields

1 gð3 Þz bT $b 2 [T Qk $zð3 Þ

(36)

with

½I ð3 3 o ÞHk zð3 Þ ¼ e1

(37)

where Hk is an upper Hessenberg matrix of size k and Qk is a rectangular N k matrix whose columns are the Arnoldi vectors. The approximation to g(3 ) in Eq. (36) reproduces the ﬁrst k 1 coefﬁcients in a Taylor series about the reference point 3 ¼ 3 o. The present implementation uses the numerically robust Householder algorithm to generate orthogonal Arnoldi vectors (see Appendix B). The convergence procedure follows that used for the Lanczos algorithm except the starting number of iterations is larger, k ¼ 0.03N. It is expected that the Arnoldi method should require more iterations than the Lanczos method. On the other hand, the Arnoldi iterations with the Householder stabilization can be performed until k ¼ N reproducing the Hessenberg decomposition in Section 5.2. The factorization of the Ao matrix is the same as in the Lanczos method. The routines ZHEFA and ZHESL solve the Hessenberg system in Eq. (37). 7. Numerical results The efﬁciency of the algorithms to compute the line proﬁle is ascertained with several test cases with emphasis on relatively

C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399

large matrices. The examples consist of Li-like satellites to the Ar Heb line, which was the subject of an earlier study [21]. In particular, the study considered the impact of the n ¼ 4 satellites on the interpretation of experimental spectra. Due to computational constraints [11], their calculations neglected the interference terms [1,2]. It has been shown, however, that the interference terms can impact the proﬁles of satellite lines [22]. The interference terms increase the size of the matrices in the calculations dramatically increasing the computational effort; thus, these cases serve as good test of the algorithms.

7.1. Test cases Two sets of lower and upper conﬁgurations are considered. The ﬁrst relatively smaller problem considers the n ¼ 3 satellites with lower conﬁgurations

1s2 3[

(38)

and upper conﬁgurations

1s3[3[0

(39)

The larger problem includes the n ¼ 3 and 4 satellites with lower conﬁgurations

1s2 3[ 1s2 4[

(40)

and upper conﬁgurations

1s3[3[0 1s3[4[0 1s4[4[0

(41)

The calculations neglect electron collisional excitations and Stark mixing between the lower and upper levels of the radiative transition, but include all dipole allowed interactions within the lower or upper level manifold. Calculations are performed with and without interference terms [1,2] to illustrate the impact of these terms on block diagonal matrix size. The test cases are deﬁned in Table 1 with a brief description and NR the total size of R.

Table 1 Case identiﬁcation, description, and diagonal block information. Case

Description

N

No[%]

Dm

1

n ¼ 3 without interference terms NR ¼ 5508

2315a 2144 228 315 240 189 84 48 11 974 887 22115 21050 2240 2115 1750 1269 720 350 132 6810 6336

9.64 19.2 66.6 9.64 11.8 15.7 23.8 54.8 80.2 12.0 12.4 7.68 15.0 53.0 7.68 9.01 12.7 17.8 44.5 62.5 9.74 10.1

0

2 3

4 a

n ¼ 3 with interference terms NR ¼ 5508 n ¼ 3,4 without interference terms NR ¼ 44100

n ¼ 3,4 with interference terms NR ¼ 44100 j N denotes j blocks of size N.

395

Also included in Table 1 for each block in the calculation is the block size N, the sparsity No ¼ nonzeros/N2, and Dm (deﬁned in Section 7.2).

7.2. Calculation details The present calculations use a semi-classical model for the electron collision operator [9] and the low-frequency component of the microﬁeld distribution [23,24]. The input atomic data is from the FAC atomic structure code [25]. In standard theory the resolvent operator conserves the difference in magnetic quantum numbers, Dm ¼ ma mb; that is,

Rab;a0 b0 fdma mb ;ma0 mb0

(42)

and R is block diagonal in Dm (R may contain smaller diagonal blocks within a given Dm block). Furthermore, by symmetry the blocks with Dm ¼Dm yield identical contributions to the proﬁle. It follows from Eq. (8) that the calculations need only explicitly consider Dm ¼ 0 and 1. The calculations were done with a recently develop Fortran code, MELS (Multi-Electron Line Shape). The matrix algebra uses the LAPACK [15] and sparse solver library routines included in the IntelÒ Math Kernel Library. Although the MELS code has parallel computing capabilities, for these comparisons all calculations were performed on a single IntelÒ Xeon 2.4GHz processor. In order to minimize memory requirements the diagonal blocks of R are treated independently and Fortran allocatable arrays are adjusted accordingly. Adaptive quadrature is used for the ﬁeld integration and is discussed further below (see Section 7.4). Before proceeding with the algorithm comparisons, line shape calculations at temperature T ¼ 500 eV and free electron density ne ¼ 1024 cm3 [21] are displayed in Fig. 1. The plasma composition for the microﬁeld probability distribution was 1% Arþ15 and 99% Hþ by number. The proﬁles in the ﬁgure are area normalized to unity. The ﬁgure shows the relative importance of the n ¼ 4 satellites and interference terms in these calculations. The calculations of the line proﬁle in Fig. 1 show that in order to determine the impact of nearby levels or physical processes on the line proﬁle it may be necessary to perform large-scale calculations. Such tests are often impractical with the conventional method and efﬁcient algorithms are welcome.

1

0 1 0

1

0 1

Fig. 1. Li-like n ¼ 3 and n ¼ 4 satellites to the Ar Heb line at T ¼ 500 eV and ne ¼ 1024 cm3:n ¼ 3 without interference terms (long-dash); n ¼ 3 and 4 without interference terms (short-dash); and n ¼ 3 and 4 with interference terms (solid).

396

C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399

7.3. Algorithm comparison The speed up provided by an algorithm is deﬁned by

speed uphtc =ta

(43)

where tc is the computation time using the conventional (explicit matrix inversion) method and ta is the computation time using the algorithm. The speed up for the various methods is weakly dependent on frequency. Furthermore, the number of frequency points in the calculations depends on the details of the proﬁle and not on any particular calculation method. Hence, the number of frequency points is the same for all the methods. On the other hand, the number of ﬁeld points in the integration does impact the speed up. Some methods are linearly dependent on the number of ﬁeld points (matrix inversion and equation solvers) while others are only weakly dependent (matrix decomposition and model-reduction). It is this weak dependence on the number of ﬁeld points that makes the latter set of methods attractive. Consequently, the speed up is given for the proﬁle calculation per frequency point. The results in Table 2 contain the speed up for several algorithms. Clearly, model-reduction provides the largest speed up that increases with problem size. Although the direct sparse solver shows considerable speed up for the larger problems, the matrices are not sufﬁciently sparse in these examples (see Table 1) to compete with the model-reduction methods. The high efﬁciency of the model-reduction schemes in these examples is due to the relatively small number of iterations necessary for the ﬁeld integration to converge (k 0:06N). This is possible since g(3 ) only displays modest structure easily reproduced by a relatively small number of iterations. In other cases g(3 ) may contain considerable structure requiring k ¼ OðNÞ. In such a circumstance, however, the number of ﬁeld points required to perform an accurate ﬁeld integration would also be signiﬁcantly increased. Since the speed up for the model-reduction methods is OðN3 Þ, it would still be considerable. Interestingly, the speed up of the eigendecomposition and Hessenberg methods display unexpected results. Speciﬁcally, operation count indicates that the Hessenberg method is w5 times faster than the eigendecomposition method. The discrepancy in running times is due to the IntelÒ Math Kernel Library routines. Taking the largest matrix in the calculations as an example, N ¼ 6810 and Dm ¼ 0 of Case 4, the Hessenberg and eigendecomposition routines respectively run 20% slower and 75% faster than expected according to the operation count. This illustrates that operation count alone may not be sufﬁcient to determine the relative speed of algorithms. It is stressed that except for the model-reduction schemes, the computed line proﬁles reproduced the conventional method results to more than 5 digits. The model-reduction methods with the adopted convergence criterion (see Section 6) give better than a few parts in 105 agreement. As a test, the Arnoldi process was implemented forcing k ¼ N which reproduced the Hessenberg method proﬁle results and computation time. Table 2 Speed up of several methods. Method

Direct Solver Direct Sparse Solver Eigendecomposition Hessenberg Lanczos Arnoldi

Case 1

2

3

4

4.2 4.5 10 20 89 128

6.5 20 14 22 268 400

4.5 5.2 18 22 262 349

6.8 21.4 19.9 20.5 427 644

Fig. 2. Relative error in line proﬁle as a function of photon energy as a result of reducing N3 by a factor of 3. Calculations are for Case 1 at conditions of Fig. 1.

7.4. Adaptive quadrature Recall that the execution time for calculations using the conventional or simultaneous equation methods (see Sections 3 and 4) depends linearly on the number of ﬁeld points. In the present calculations the relative error in the adaptive quadrature routine was set to produce smooth plots (i.e.; relative errors < 0:5% in the line proﬁle required N3 ¼ 100 125). It is then possible to decrease the calculation time by decreasing the number of ﬁeld points. As an example, calculations for Case 1 were done with the conventional approach but reducing the number of ﬁeld points by one third (i.e.; N3 /N3 =3). The resulting errors relative to the more accurate calculation are displayed in Fig. 2. The ﬁgure shows that a decrease in computational time by a factor of 3 yields a systematic underestimate of the line wings of w5% due to inaccuracies in the ﬁeld integration. Since several algorithms provide larger savings without loss in precision, such compromises between accuracy and speed are not easily justiﬁed. 7.5. Iterative solver comparisons The iterative solver algorithm is not as easily implemented as the other methods; hence, it is treated as a special case. The test is limited to the largest block in the test cases, N ¼ 6810 and Dm ¼ 0 of Case 4. Furthermore, the ﬁeld-dependent line shape is computed at representative ﬁeld values and at Zu ¼ 3663 eV, which is the strength-weighted center of all the lines in Case 1. Table 3 displays the ﬁeld values and speed up relative to the conventional method. The ﬁeld is given in units often used in line shape calculations,

bo ¼ e=a2e

(44)

Table 3 Speed up of iterative QMR method at selected values of the Stark ﬁeld.

b

3/ o

Speed up

1 5 10 15 20

14.3 14.3 13.6 12.5 2.5

C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399

with ae the electron-sphere radius deﬁned by

4pne a3e =3

¼ 1

X

gð3 Þ ¼

3

2n

(A.2)

n¼0

(45)

The results show that the speed up degrades with increasing ﬁeld, which is a consequence of the relative increase in the offdiagonal contributions to R1 ðu; 3 Þ and the matrix condition number. Eventually, the condition number increases sufﬁciently that it is necessary to improve the pre-conditioner to obtain convergence. This was the case for the largest ﬁeld value in Table 3. A comparison with the results in Table 2 shows that the QMR iterative solver is signiﬁcantly faster than the conventional method. As presently used, however, the matrix sparsity in the present calculations makes the QMR method not as efﬁcient as the modelreduction approaches. Finally, iterative solver methods are not guaranteed to converge so that contingencies have to be implemented to account for such an event. Nevertheless, iterative solvers are an import class of methods and further experimentation with a variety of pre-conditioners may prove valuable.

[T $C 2n A1 r

397

where the odd power Taylor coefﬁcients vanish. The symmetry of g(3 ) requires that the C matrix and the vectors [ and A1 r have special forms [26]. For the Stark-proﬁle problem the condition on the Taylor coefﬁcient is satisﬁed only if there exists a block structure such that a permutation transformation P satisfying PP T ¼ P T P ¼ I, leads to

P[ ¼

[1 ; 0

PA1 r ¼

PCP T ¼

(A.3)

x1 ; 0

0 C2

(A.4)

C1 0

(A.5)

Then it follows from Eq. (A.5) that

8. Conclusion

The efﬁciency of various algorithms to compute Stark-proﬁles in the quasi-static ion approximation was compared. It is emphasized that the methods in the comparison do not rely on physical approximations beyond those present in the standard theory. The conventional approach, which uses explicit matrix inversion, was the least efﬁcient of the algorithms tested. The most efﬁcient were the model-reduction methods that provided more than 2 orders of magnitude faster execution than the conventional approach for large-scale calculations. Although there are numerically robust realizations of the Lanczos process, the Arnoldi process with Householder orthogonalization is more easily implemented and highly stable. Often, computation times can depend on the optimization of library routines, but the speed up of the model-reduction methods is not strongly dependent on the efﬁciency of the libraries. Speciﬁcally, the main operation count in both the conventional and model-reduction methods using unoptimized routines is the factorization of a complex, symmetric matrix respectively requiring OðN3 N3 Þ and OðN 3 =3Þ operations to compute the line proﬁle (here N is the matrix size). Consequently, the speed up is w3N3 , which is in reasonable agreement with the present results using optimized routines. It is expected that the highly efﬁcient model-reduction methods should make future large-scale quasi-static ion line proﬁle calculations practical without having to introduce uncontrolled approximations.

PC 2n P T ¼

0 U2

U1 0

and

PC 2nþ1 P T ¼

V1 0

0 V2

(A.6)

and guarantees the vanishing of the odd Taylor coefﬁcients in Eq. (A.2). Now assume A is diagonal. Since AC ¼ B it follows B must have the same structure as C,

PBP T ¼

0 B2

B1 0

(A.7)

In turn, the structure of B is independent of approximations to A. So for the product AC to produce the required structure in B, the permutation must also lead to

PAP T ¼

A1 0

0 A2

(A.8)

which is consistent with the requirement in Eq. (A.4) (recall that the vectors [ and r are simply related by the occupation number). Even though the resolvent R is in block diagonal form, the A matrix can be reduced further to at least 2 diagonal blocks. Although not implemented at present, the structure of these matrices can save a factor of 2 in storage and accelerate the factorization of A required in some algorithms. Finally, note that the present arguments do not apply for the model-reductions schemes using Ao and Co in Eqs. (30) and (32).

Acknowledgments We thank Howard A. Scott for the atomic data. This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC5207NA27344.

Using symmetry arguments it follows that selecting the z-axis for the radiator internal states to be along or against the ﬁeld direction must yield identical results. These two choices yield

Consequently, the Taylor series expansion has the form

The Arnoldi process is similar to the Lanczos algorithm in that it constructs an orthogonal basis for the Krylov space,

n o Kk ðCo ; bÞ ¼ span b; Co b; Co2 b; .; Cok1 b

Appendix A. Special structure of A, B, and C matrices

gð3 Þ ¼ [T $fI 3 Cg1 A1 r ¼ [T $fI þ 3 Cg1 A1 r ¼ gð3 Þ

Appendix B. Arnoldi process with Householder stabilization

(B.1)

For the N N matrix Co, performing k of the classical Arnoldi iterations returns a set of k vectors as the columns of the N k rectangular matrix Qk and the k k Hessenberg (tridiagonal plus upper triangular) matrix Hk. These quantities have the properties, [27]

(A.1) b ¼

1 bT $b 2 Qk e1

(B.2)

398

C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399

qTi $qj ¼ dij

or

Qky Qk ¼ I

(B.3)

B.2 Householder algorithm

(B.4)

The complex Householder algorithm [9] required by the Arnoldi process above is given below with minor modiﬁcations. Given a complex vector x of length n and an integer k such that 1 k n, form a scalar b and a vector w of length n so that the matrix

and

Co Qk ¼ Qk Hk þ Hk ðk þ 1; kÞqkþ1 eTk

where Hk ði; jÞ are the elements of Hk, qj is the jth Arnoldi vector, and eTj is the jth unit vector. The connection with the complex function

gð3 Þ ¼ [T $fI ð3 3 o ÞCo g1 b

(B.5)

is made as follows. It is readily shown using Eqs. (B.2)e(B.4) together with the knowledge that Hk is a Hessenberg matrix that for j < k, j

1 j bT $b 2 Co Qk e1 1 j ¼ bT $b 2 Qk Hk e1

Co b ¼

P ¼ I bwwT

(B.2.1)

has the property that the ith element of Px vanishes for i ¼ k þ 1; .; n. Algorithm 2.

1: Define xmax ¼ max½jxðiÞj; i ¼ k; .n 2: Ifðxmax ¼ 0:0Þ; then

(B.6)

3: b ¼ 0:0 4: wð1 : nÞ ¼ 0:0 5: return

Hence, the Taylor coefﬁcients of g(3 ) are given by

mj ¼ ¼

j [T $Co b

1 j bT $b 2 [T Qk $Hk e1

6: else (B.7)

10: IfðjwðkÞj ¼ 0:0Þ; then (B.8)

11: wðkÞ ¼ a 12: else

which reproduces the ﬁrst k 1 Taylor coefﬁcients of g(3 ). These results are slightly modiﬁed when using the numerically stable Householder algorithm to generate orthogonal Arnoldi vector. The details are provided in the following sub-sections.

13: wðkÞ ¼

wðkÞ fa þ jwðkÞjg jwðkÞj

14: EndIf 15: EndIf The notation n ¼ i; i þ 1; .; j.

B.1 Arnoldi algorithm The Arnoldi algorithm generates the matrix Qk ¼ ½q1 q2 .qk with qj the Arnoldi vectors and the Hessenberg matrix Hk ¼ ½h1 h2 .hk with hj the columns. The orthogonalization is performed with the numerically robust Householder stabilization [28] leading to the following algorithm [27]. Algorithm 1.

x(i:j)

denotes

the

implied

Fortran

2: For j ¼ 1; .; k Do :

loop

B.3 Line shape expression Using the Householder algorithm to stabilize the Arnoldi iterations introduces minor modiﬁcations to the approximant line shape. Speciﬁcally, Eq. (B.8) is modiﬁed to read 1 bð1Þ T 2 T gk ð3 Þ ¼ b $b [ Qk $fIk 3 Hk g1 e1 jbð1Þj

1: Set x1 ¼ b

(B.2.4)

where b(1) is the ﬁrst element of the vector b. Furthermore, requiring that

3: Compute the Householder unit vector wj and Pj from xj See Section B:2 4: hj1 ¼ Pj xj

(B.2.3)

1 9: b ¼ aða þ jwðkÞjÞ

Now by analogy with Eq. (B.5), write

gð3 Þzgk ð3 Þ 1 ¼ bT $b 2 [T Qk $fIk ð3 3 o ÞHk g1 e1

xðk : nÞ xmax qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ 2 8: a ¼ wðkÞ þwðk þ 1Þ2 þ.wðnÞ2

7: wðk : nÞ ¼

bð1Þ T 2 b $b e1 Qky b ¼ jbð1Þj 1

(B.1.1)

5: qj ¼ P1 P2 .Pj ej 6: If j k compute xjþ1 ¼ Pj Pj1 .Co qj 7: EndDo 8: hk ¼ xkþ1 Note that the matrices Pj need not be formed explicitly. To obtain hj1 from xj in line 4, zero out all the components from position j þ 1 through n of the n-vector xj and change its jth component, leaving all others unchanged. Saad [27] gives a detailed discussion of the Arnoldi process and its implementation.

(B.2.5)

offers a check on the implementation of Algorithm 1 above. References [1] M. Baranger, in: D. Bates (Ed.), Atomic and Molecular Processes, Academic Press, New York, 1962. [2] H.R. Griem, Spectral Line Broadening by Plasmas. Academic, New York, 1974. [3] H. Margeneau, M. Lewis, Rev. Mod. Phys. 31 (1959) 569. [4] J. Holtsmark, Ann. Phys. (Leipzig) 7 (1919) 578. [5] L.A. Woltz, C.F. Hooper, Phys. Rev. 38 (1988) 4766. [6] R.C. Mancini, et al., Comput. Phys. Commun. 63 (1991) 314.

C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399 [7] [8] [9] [10] [11] [12] [13] [14]

P.A. Loboda, et al., Laser Particle Phys. 18 (2000) 275. G.H. Golub, C.F. Loan, Matrix Computations. John Hopkins Press, 1996. A. Calisti, et al., Phys. Rev. A. 42 (1990) 5433. R.W. Lee, (private communication). R. Mancini, (private communication). C.A. Iglesias, V. Sonnad, HEDP 6 (2010) 399. V. Sonnad, C.A. Iglesias, HEDP 6 (2010) 391. R.D. Cowan, The Theory of Atomic Structure and Spectra. University of California Press, Berkeley, 1981. [15] E. Anderson, et al., LAPACK Users Guide, third ed. Society for Industrial & Applied Mathematics, 1999, ISBN 0898714478. [16] R.W. Freund, Nachtigal, Numer. Math. 60 (1991) 315. [17] R.W. Freund, N.M. Nachtigal, SIAM J. Sci. Compt 15 (1994) 313.

399

[18] R.W. Freund, N.M. Nachtigal, Fortran Distribution of QMR. Implementation: Copyright, 1991. [19] P. Feldman, R.W. Freund, IEEE Trans. Computer-Aided Des. Integrated Circuits Syst. 14 (1995) 639. [20] R.W. Freund, M.H. Gutknecht, N.M. Nachtigal, SIAM J. Sci. Compt. 14 (1993) 137. [21] I.E. Golovkin, R.C. Mancini, JQSRT 65 (2000) 273. [22] C.A. Iglesias, HEDP 6 (2010) 318. [23] C.A. Iglesias, et al., Phys. Rev A. 31 (1985) 1698. [24] C.A. Iglesias, et al., JQSRT 65 (2000) 303. [25] M.F. Gu, Can. J. Phys. 86 (2008) 675. [26] R.W. Freund, private communication. [27] Y. Saad, Iterative Methods for Sparse Linear Systems. PWS Publishing Co, 1996. [28] H.F. Walker, SIAM J. Sci. Compt. 9 (1988) 152.

Algorithm comparisons for Stark-profile calculations

Algorithm comparisons for Stark-profile calculations

Recommend Documents