High Energy Density Physics 7 (2011) 391e399
Contents lists available at SciVerse ScienceDirect
High Energy Density Physics journal homepage: www.elsevier.com/locate/hedp
Algorithm comparisons for Stark-profile calculations Carlos A. Iglesias*, Vijay Sonnad Lawrence Livermore National Laboratories, P.O. Box 808, Livermore, CA 94550, USA
a r t i c l e i n f o
a b s t r a c t
Article history: Received 25 July 2011 Received in revised form 1 September 2011 Accepted 1 September 2011 Available online 10 September 2011
The efficiency of several algorithms to calculate Stark broadened line shapes in the quasi-static ion approximation is compared. The algorithms can be grouped into three general approaches: simultaneous equation solvers, matrix decompositions, and model-reduction. It is emphasized that the tested algorithms do not rely on approximations beyond the quasi-static ion assumption. The comparisons show that model-reduction schemes are the most efficient and are more than 2 orders of magnitude faster than the conventional method for large-scale calculations. Consequently, complex line shape calculations become practical without the compromises often required in the past. Ó 2011 Elsevier B.V. All rights reserved.
Keywords: Stark broadening Line shapes
1. Introduction In standard Stark broadening theory of spectral lines, the perturbing ions are assumed to have negligible motion during the average lifetime of the radiating states [1e3]. Thus, the line shape calculation averages the Stark field induced state mixing and energy shifts over the probability distribution for the quasi-static field [2e4]. In the absence of external fields, the line shape can be written in the form [1e3],
ZN IðuÞ ¼
d3 Pð3 ÞJðu; 3 Þ
(1)
0
where Zu is the photon energy, P(3 ) is the probability of finding a Stark field of magnitude 3 at the radiator, and J(u;3 ) represents the line shape due to radiatoreelectron interactions in the presence of this field. The numerical evaluation of the line shape replaces the integral in Eq. (1) with the sum N X 3
IðuÞ ¼
wi Pð3 i ÞJðu; 3 i Þ
(2)
i¼1
where N3 is the number of field points and wi are weights appropriate to the integration scheme. The calculation of Eq. (2) poses a challenge. To obtain accurate results the field mesh must resolve * Corresponding author. E-mail address:
[email protected] (C.A. Iglesias). 1574-1818/$ e see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.hedp.2011.09.001
features in J(u;3 ). Since the structure of J(u;3 ) is usually not known a priori, a large number of field points or an adaptive quadrature scheme is often necessary. Conventional methods, however, use an explicit matrix inversion at each frequency and field value [5e7], which is computationally expensive for large matrices, scaling as N3 operations for a matrix of size N [8]. Consequently, there is a premium for using many field points. Often the practical resolution of these contending needs is the introduction of approximations to accelerate the calculations [9]. If such compromises are made in large-scale line shape calculations, the complexity of the spectrum may make it difficult to identify possible errors [10,11] except by comparisons to more complete calculations. Recently, reliable and efficient methods were developed to compute the line profiles [12,13]. Although the operation count for these methods indicates significant reduction in computational effort compared to the conventional approach, this is not sufficient to determine execution time. That is, machine-specific optimized computer routines from the basic linear algebra subprogram libraries can be considerably faster than unoptimized operations and it is necessary to compare actual implementations to establish execution times and speed up. The purpose here is to determine the efficiency of various algorithms, in addition to those previously proposed [12,13], in large-scale quasi-static ion line shape calculations relative to the conventional approach. The intent is to identify fast, accurate, and numerically robust methods to compute Stark-profiles. It is emphasized that these methods do not rely on approximations beyond those present in the standard line broadening theory. The discussion begins in Section 2 with a brief review of Stark line broadening theory. The various algorithms are discussed in
392
C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399
Section 3 through 6 followed by comparisons in Section 7 and conclusions in Section 8.
N X 3
wi Pð3 i ÞIm½gð3 i Þ
(11)
i¼1
2. Stark broadening theory
where the complex function of the Stark field has the form
The field-dependent line shape function in Eq. (1) can be written in the form [1,2] /
/
Jðu; 3 Þ ¼ p1 ImTrf d $Rðu; 3 Þr d g
(3)
where Tr denotes a trace over the radiator internal states with r and /
d the radiator density matrix and dipole operators, respectively. The resolvent operator R is from standard theory where the electrons are treated in the binary collision approximation and is defined by its inverse
R1 ðu; 3 Þ ¼ AðuÞ 3 B
(4)
where the field independent contribution is given by
AðuÞ ¼ Du FðuÞ;
(5)
DuY ¼ u ½H ; Y;
(6)
and the coefficient to the quasi-static field is
(7)
Here, Y is an arbitrary operator in the radiator subspace, [.] denotes a commutator, H is the Hamiltonian for the internal states of the radiator, and the Stark field defines the z-axis so dz is the zcomponent of the dipole operator . The “width and shift” operator, F(u), contains the plasma electron contribution to the relaxation processes of the radiator states [1,2,5e7,9]. Performing the trace in Eq. (3) yields,
XXX q
0 0
ab a b
ðqÞ
ðqÞ
dab Rab;a0 b0 ðu; 3 Þda0 b0 ra0
(8)
with the internal radiator states defined as jai ¼ jGa ja ma i; jbi ¼ jGb jb mb i, and so on, where jGjmi has total angular momentum j, magnetic quantum number m, and G represents any additional quantum numbers necessary to complete the description. The elements of the qth spherical component of the radiator dipole operator are given by [14] ðqÞ
dab ¼ hajdðqÞ jbi ¼ ð1Þja ma
ja ma
jb mb
1 1=2 S q ab
(9)
hajRd
jb i ¼
X 0 0
ab
ðqÞ
Rab;a0 b0 da0 b0
with superscript T the transpose and [ $r the vector dot product. After calculating the elements of the matrices A and B as well as the vectors [ ¼ dðqÞ and r ¼ rd(q), the goal is to evaluate g(3 ) using robust algorithms without uncontrolled approximations at the Stark field values necessary to perform an accurate integration over the Stark field. In the conventional approach the calculation of g(3 ) in Eq. (12) proceeds with an explicit matrix inversion of {A3 B} at each field value [5e7]. As it turns out, the conventional method has the highest operation count of the algorithms considered in the present work and is the least efficient method for quasi-static ion line shapes calculations. The matrix {A3 B} is complex and symmetric so the present calculations use the LAPACK [15] routine ZSYTRF for the LDLT factorization and ZSYTRI for the inversion.
It is not necessary to perform the explicit matrix inversion. In fact, explicit inversion requires w3 times more computational effort and is less stable than solving a system of equations [8]. That is, g(3 ) can be expressed in the form
gð3 Þ ¼ [T $zð3 Þ
(13)
where the vector z(3 ) satisfies the simultaneous equations
fA 3 Bgz ¼ r
(14)
Several algorithms for solving the matrix system in Eq. (14) are considered.
4.1. Direct complex symmetric solver This method solves for z(3 ) using complex arithmetic. The approach does a factorization of {A3 B} followed by a forward and backward substitution solver [8]. The calculations use the LAPACK routine ZSYTRF for the factorization and ZSYTRS for the solver.
4.2. Direct real symmetric solver
in terms of the Wigner 3-j symbol and the reduced dipole matrix element of the Wigner-Eckart theorem. The density matrix is assumed diagonal with elements ra the occupation of state jai. The tetradic notation was introduced [5,6] ðqÞ
(12) T
4. Simultaneous linear equations
BY ¼ h1 ½dz ; Y:
Jðu; 3 Þ ¼ p1 Im
gð3 Þ ¼ [T $fA 3 Bg1 r
(10) ðqÞ
Thus, in ‘line space’ Rab,a0 b0 and dab are elements of a matrix and a vector, respectively. 3. Conventional calculation method It follows from Eqs. (2) and (8) that the calculation of the line shape function in Eq. (1) at a given frequency can be viewed as the following problem: Perform the numerical sum for each relevant diagonal block of the resolvent R,
The problem can be solved using real arithmetic by writing Eq. (14) in the form
A1 3 B A2
A2 A1 þ 3 B
x y
¼
r 0
(15)
where the real and imaginary parts are defined by
A ¼ A1 þ iA2 z ¼ x þ iy
(16)
Even though the problem in Eq. (15) involves a real symmetric matrix, this scheme is inefficient in both storage and operation count relative to the complex symmetric solver and is not considered any further. Nevertheless, the method may be valuable for computers with poor complex arithmetic performance.
C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399
393
5.1. Eigendecomposition
4.3. Direct sparse complex symmetric solver Typically the matrix {A3 B} is not dense and there exist computationally efficient factorization schemes designed for sparse matrices. The present calculations use direct sparse solver routines available for symmetric, complex matrices in the IntelÒ Math Kernel Library.
The eigendecomposition of the C matrix [8],
CQL ¼ QL L
(23)
leads to [12]
fI 3 Lgz ¼ u
4.4. Iterative solutions
(24)
l0 s
Iterative methods for the solution of linear equations are an important alternative to direct matrix factorization approaches. Under certain conditions, (high sparsity and efficient preconditioning), iterative techniques can be significantly faster than direct approaches. The quasi-minimal residual (QMR) method has been shown to be an efficient and relatively robust approach for complex symmetric matrices [16,17]. A consideration is the quality of the pre-conditioner. In the absence of physically motivated choices, then incomplete factorizations are applicable across a wide range of problems. The present calculations use the QMR routines with LLT pre-conditioners [18].
where L ¼ diag {l1,l2,.} with the eigenvalues and the columns of QL the eigenvectors of C. Since C is non-Hermitian, QL is not unitary and the l0 s are complex numbers. The line shape is now easily computed since L is a diagonal matrix. As was stressed previously [12] the eigendecomposition here is distinct from earlier eigendecompositions in line broadening theory. The calculation calls the LAPACK routines ZGEEV for the eigendecomposition of C, ZSYTRF and ZSYTRS for the factorization and solver for A, plus ZGETFR and ZGETFS for the LU factorization and solver for QL. The eigendecomposition approach suffers, in principle, from a large overhead in operation count (w27N3 operations for a non-Hermitian complex matrix of size N) and can be numerically unstable [8].
5. Matrix decomposition 5.2. Hessenberg decomposition The considered matrix decomposition approaches in effect separate the computationally intensive matrix algebra from the field integration. The first step is to rewrite Eq. (12) as [12]
A robust and by operation count more efficient approach uses the Hessenberg decomposition [8] of the C matrix
gð3 Þ ¼ [T $fI 3 Cg1 A1 r
CQH ¼ QH H
(17)
leading to [12]
where I is the identity matrix and
AC ¼ B
(18) 1
The manipulations above require that A exist, which is always the case since it describes the line shape in the absence of a quasistatic field. To proceed, formally perform the decomposition of the C matrix,
CQM ¼ QM M
(19)
Substituting Eq. (19) into Eq. (17) yields 1 A1 r gð3 Þ ¼ [T QM $fI 3 Mg1 QM T ¼ [ QM $zð3 Þ
(20)
fI 3 Mgz ¼ u
½I 3 Hzð3 Þ ¼ u
(26)
Here H is a Hessenberg matrix (tridiagonal plus upper triangular) y y and QH is unitary; that is, QH QH ¼ QH QH ¼ I where the superscript y denotes Hermitian conjugate. The Hessenberg decomposition requires a factor of w5 less operations than the eigendecomposition, but to solve the simultaneous equations in Eq. (26) requires O(N3 N2) operations rather than O(N3 N) in Eq. (24). The calculation calls the LAPACK routines ZGEBAL, ZGEHRD, ZUNGHR, and ZGEBAK.
5.3. Schur decomposition The Schur decomposition [8] leads to
where
(21)
and u, which is field independent, is defined by
AQM u ¼ r
(25)
CQS ¼ QS S
(27)
and
(22)
Again, explicit matrix inversion is not required, rather Eq. (22) is solved using the factorization of the A and QM matrices. Note that even though the resolvent R is already block diagonal, the A matrix can be further reduced to at least 2 diagonal blocks (see Appendix A). The matrix decomposition in Eq. (19) undoubtedly has an operation count higher than the factorization of {A3 B}, however, the decomposition is performed only once. Hence, depending on the structure of the M matrix, it is amortized by a fast solution of Eq. (21) performed N3 times. This can lead to an overall reduction in computational effort.
½I 3 Szð3 Þ ¼ u
(28)
where S is an upper triangular matrix and QS is unitary. The main feature of this decomposition is that the diagonal elements of S contain the eigenvalues of C, which could be used to optimize the field integration [12]. The operation count of the Schur decomposition is w3 times larger than for the Hessenberg decomposition; hence, it was not implemented in the present work. An alternative way to obtain the eigenvalues is to use the LAPACK routine ZHSEQR to compute eigenvalues of the Hessenberg matrix H. This last step, however, doubles the operation count in the Hessenberg method and was also not implemented.
394
C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399
6. Model-reduction The idea of model-reduction schemes is to replace g(3 ) with an approximate function that involves smaller matrices than the original problem but reproduces a certain number of Taylor coefficients. Consider the fixed expansion point 3 ¼ 3 o to write the Taylor series,
gð3 Þ ¼ [T $fðA 3 o BÞ ð3 3 o ÞBg1 r ¼ [T $fI ð3 3 o ÞCo g1 b P mn ð3 3 o Þn ¼
(29)
n
where
Ao ¼ A 3 o B
(30)
Ao b ¼ r
(31)
Ao Co ¼ B
(32)
and
mn ¼ [T $Con b
(33)
is the nth Taylor coefficient. As an example, consider a rational function approximation to g(3 ). That is, the Taylor series may not be a useful approximation, but a rational function that reproduces the Taylor coefficients can have a considerably larger range of validity. By equating powers of the expansion variable it is possible to obtain simultaneous equations for the rational function coefficients. It is natural to expect the accuracy of the approximant to improve with increasing order of the approximation. This, however, is not the case for this procedure. Instead, the answer stagnates after matching only up to a small number of Taylor coefficients [19]. Fortunately, there are techniques to obtain model-reduction that avoid the stagnation problem. 6.1. Lanczos process The Lanczos method can be considered as an iterative approach for generating a pair of bi-Hermitian conjugate spaces to reduce a general square matrix to a tridiagonal matrix of smaller size. After k Lanczos iterations the process leads to the expression [13,19]
h ih i gð3 Þz [T $b eT1 $zð3 Þ
(34)
where
½I ð3 3 o ÞTk z ¼ e1
(35) T
Here, Tk is a tridiagonal matrix of size k, and e1 ¼ [1,0,.,0] is a unit vector of length k. Performing k iterations in the Lanczos process yields an approximation to g(3 ) that reproduces the first 2k 1 coefficients in a Taylor series expansion about the reference point 3 ¼ 3 o. In an average sense, additional iterations increase the range of validity relegating errors to regions further away from the reference point until it reaches its best value when the number of iterations is equal to the size of the matrix C. In the line profile calculation the effective range of integration over the Stark field is defined by P(3 ). This feature allows for a physically motivated approach to determine convergence of the Lanczos iterations. That is, the integration over the Stark field provides a straightforward convergence criterion [13]. Choosing 3 o
near the peak of the integrand in Eq. (11) diminishes the impact of these errors and minimizes the number of Lanczos iterations. The present calculations start with k ¼ 0.02N with N the size of the R diagonal block. The convergence criterion increments the number of Lanczos iterations by Dk ¼ 0.01N until the desired convergence is achieved, which is assumed when the relative difference in the integration over the Stark field for the last 3 sets of iterations is less than 103. Although it is possible to choose a smaller initial number of iterations with possible timesaving, care is then required since for a relatively small number of iterations the results can be highly inaccurate with positive and negative contributions and the adaptive integration routine may fail to converge. The number of iterations depends on the choice of reference field. In the calculations 3 o is chosen at the maximum value of the integrand from results of the same diagonal block of R at nearby frequencies. If no nearby results are available, the peak value location of P(3 ) is used. For calculations assuming a dense Ao matrix, the routines ZSYTRF and ZSYTRS can be used to compute the Co matrix and generate the Lanczos vectors. Here, the sparse matrix factorization and solver routines from the IntelÒ Math Kernel Libraries are used. The ZGTSV routine is called to solve the tridiagonal system. Since in either case the time is dominated by matrix factorization and matrix-vector multiplication, the robust Modified Gram Schmidt procedure is applied at each iteration (the equivalent of full orthogonalization for complex non-symmetric matrices). It is emphasized that even with orthogonalization the classical Lanczos algorithm can be numerically unstable. Although this problem can be efficiently remedied by robust approaches [20], numerical instability was not a problem in the present calculations and the more robust algorithm was not implemented. 6.2. Arnoldi process The Arnoldi method is similar to the Lanczos algorithm in that it creates an orthonormal basis for a Krylov subspace to reduce a general square matrix to a Hessenberg matrix of smaller size. Performing k Arnoldi iterations yields
1 gð3 Þz bT $b 2 [T Qk $zð3 Þ
(36)
with
½I ð3 3 o ÞHk zð3 Þ ¼ e1
(37)
where Hk is an upper Hessenberg matrix of size k and Qk is a rectangular N k matrix whose columns are the Arnoldi vectors. The approximation to g(3 ) in Eq. (36) reproduces the first k 1 coefficients in a Taylor series about the reference point 3 ¼ 3 o. The present implementation uses the numerically robust Householder algorithm to generate orthogonal Arnoldi vectors (see Appendix B). The convergence procedure follows that used for the Lanczos algorithm except the starting number of iterations is larger, k ¼ 0.03N. It is expected that the Arnoldi method should require more iterations than the Lanczos method. On the other hand, the Arnoldi iterations with the Householder stabilization can be performed until k ¼ N reproducing the Hessenberg decomposition in Section 5.2. The factorization of the Ao matrix is the same as in the Lanczos method. The routines ZHEFA and ZHESL solve the Hessenberg system in Eq. (37). 7. Numerical results The efficiency of the algorithms to compute the line profile is ascertained with several test cases with emphasis on relatively
C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399
large matrices. The examples consist of Li-like satellites to the Ar Heb line, which was the subject of an earlier study [21]. In particular, the study considered the impact of the n ¼ 4 satellites on the interpretation of experimental spectra. Due to computational constraints [11], their calculations neglected the interference terms [1,2]. It has been shown, however, that the interference terms can impact the profiles of satellite lines [22]. The interference terms increase the size of the matrices in the calculations dramatically increasing the computational effort; thus, these cases serve as good test of the algorithms.
7.1. Test cases Two sets of lower and upper configurations are considered. The first relatively smaller problem considers the n ¼ 3 satellites with lower configurations
1s2 3[
(38)
and upper configurations
1s3[3[0
(39)
The larger problem includes the n ¼ 3 and 4 satellites with lower configurations
1s2 3[ 1s2 4[
(40)
and upper configurations
1s3[3[0 1s3[4[0 1s4[4[0
(41)
The calculations neglect electron collisional excitations and Stark mixing between the lower and upper levels of the radiative transition, but include all dipole allowed interactions within the lower or upper level manifold. Calculations are performed with and without interference terms [1,2] to illustrate the impact of these terms on block diagonal matrix size. The test cases are defined in Table 1 with a brief description and NR the total size of R.
Table 1 Case identification, description, and diagonal block information. Case
Description
N
No[%]
Dm
1
n ¼ 3 without interference terms NR ¼ 5508
2315a 2144 228 315 240 189 84 48 11 974 887 22115 21050 2240 2115 1750 1269 720 350 132 6810 6336
9.64 19.2 66.6 9.64 11.8 15.7 23.8 54.8 80.2 12.0 12.4 7.68 15.0 53.0 7.68 9.01 12.7 17.8 44.5 62.5 9.74 10.1
0
2 3
4 a
n ¼ 3 with interference terms NR ¼ 5508 n ¼ 3,4 without interference terms NR ¼ 44100
n ¼ 3,4 with interference terms NR ¼ 44100 j N denotes j blocks of size N.
395
Also included in Table 1 for each block in the calculation is the block size N, the sparsity No ¼ nonzeros/N2, and Dm (defined in Section 7.2).
7.2. Calculation details The present calculations use a semi-classical model for the electron collision operator [9] and the low-frequency component of the microfield distribution [23,24]. The input atomic data is from the FAC atomic structure code [25]. In standard theory the resolvent operator conserves the difference in magnetic quantum numbers, Dm ¼ ma mb; that is,
Rab;a0 b0 fdma mb ;ma0 mb0
(42)
and R is block diagonal in Dm (R may contain smaller diagonal blocks within a given Dm block). Furthermore, by symmetry the blocks with Dm ¼Dm yield identical contributions to the profile. It follows from Eq. (8) that the calculations need only explicitly consider Dm ¼ 0 and 1. The calculations were done with a recently develop Fortran code, MELS (Multi-Electron Line Shape). The matrix algebra uses the LAPACK [15] and sparse solver library routines included in the IntelÒ Math Kernel Library. Although the MELS code has parallel computing capabilities, for these comparisons all calculations were performed on a single IntelÒ Xeon 2.4GHz processor. In order to minimize memory requirements the diagonal blocks of R are treated independently and Fortran allocatable arrays are adjusted accordingly. Adaptive quadrature is used for the field integration and is discussed further below (see Section 7.4). Before proceeding with the algorithm comparisons, line shape calculations at temperature T ¼ 500 eV and free electron density ne ¼ 1024 cm3 [21] are displayed in Fig. 1. The plasma composition for the microfield probability distribution was 1% Arþ15 and 99% Hþ by number. The profiles in the figure are area normalized to unity. The figure shows the relative importance of the n ¼ 4 satellites and interference terms in these calculations. The calculations of the line profile in Fig. 1 show that in order to determine the impact of nearby levels or physical processes on the line profile it may be necessary to perform large-scale calculations. Such tests are often impractical with the conventional method and efficient algorithms are welcome.
1
0 1 0
1
0 1
Fig. 1. Li-like n ¼ 3 and n ¼ 4 satellites to the Ar Heb line at T ¼ 500 eV and ne ¼ 1024 cm3:n ¼ 3 without interference terms (long-dash); n ¼ 3 and 4 without interference terms (short-dash); and n ¼ 3 and 4 with interference terms (solid).
396
C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399
7.3. Algorithm comparison The speed up provided by an algorithm is defined by
speed uphtc =ta
(43)
where tc is the computation time using the conventional (explicit matrix inversion) method and ta is the computation time using the algorithm. The speed up for the various methods is weakly dependent on frequency. Furthermore, the number of frequency points in the calculations depends on the details of the profile and not on any particular calculation method. Hence, the number of frequency points is the same for all the methods. On the other hand, the number of field points in the integration does impact the speed up. Some methods are linearly dependent on the number of field points (matrix inversion and equation solvers) while others are only weakly dependent (matrix decomposition and model-reduction). It is this weak dependence on the number of field points that makes the latter set of methods attractive. Consequently, the speed up is given for the profile calculation per frequency point. The results in Table 2 contain the speed up for several algorithms. Clearly, model-reduction provides the largest speed up that increases with problem size. Although the direct sparse solver shows considerable speed up for the larger problems, the matrices are not sufficiently sparse in these examples (see Table 1) to compete with the model-reduction methods. The high efficiency of the model-reduction schemes in these examples is due to the relatively small number of iterations necessary for the field integration to converge (k 0:06N). This is possible since g(3 ) only displays modest structure easily reproduced by a relatively small number of iterations. In other cases g(3 ) may contain considerable structure requiring k ¼ OðNÞ. In such a circumstance, however, the number of field points required to perform an accurate field integration would also be significantly increased. Since the speed up for the model-reduction methods is OðN3 Þ, it would still be considerable. Interestingly, the speed up of the eigendecomposition and Hessenberg methods display unexpected results. Specifically, operation count indicates that the Hessenberg method is w5 times faster than the eigendecomposition method. The discrepancy in running times is due to the IntelÒ Math Kernel Library routines. Taking the largest matrix in the calculations as an example, N ¼ 6810 and Dm ¼ 0 of Case 4, the Hessenberg and eigendecomposition routines respectively run 20% slower and 75% faster than expected according to the operation count. This illustrates that operation count alone may not be sufficient to determine the relative speed of algorithms. It is stressed that except for the model-reduction schemes, the computed line profiles reproduced the conventional method results to more than 5 digits. The model-reduction methods with the adopted convergence criterion (see Section 6) give better than a few parts in 105 agreement. As a test, the Arnoldi process was implemented forcing k ¼ N which reproduced the Hessenberg method profile results and computation time. Table 2 Speed up of several methods. Method
Direct Solver Direct Sparse Solver Eigendecomposition Hessenberg Lanczos Arnoldi
Case 1
2
3
4
4.2 4.5 10 20 89 128
6.5 20 14 22 268 400
4.5 5.2 18 22 262 349
6.8 21.4 19.9 20.5 427 644
Fig. 2. Relative error in line profile as a function of photon energy as a result of reducing N3 by a factor of 3. Calculations are for Case 1 at conditions of Fig. 1.
7.4. Adaptive quadrature Recall that the execution time for calculations using the conventional or simultaneous equation methods (see Sections 3 and 4) depends linearly on the number of field points. In the present calculations the relative error in the adaptive quadrature routine was set to produce smooth plots (i.e.; relative errors < 0:5% in the line profile required N3 ¼ 100 125). It is then possible to decrease the calculation time by decreasing the number of field points. As an example, calculations for Case 1 were done with the conventional approach but reducing the number of field points by one third (i.e.; N3 /N3 =3). The resulting errors relative to the more accurate calculation are displayed in Fig. 2. The figure shows that a decrease in computational time by a factor of 3 yields a systematic underestimate of the line wings of w5% due to inaccuracies in the field integration. Since several algorithms provide larger savings without loss in precision, such compromises between accuracy and speed are not easily justified. 7.5. Iterative solver comparisons The iterative solver algorithm is not as easily implemented as the other methods; hence, it is treated as a special case. The test is limited to the largest block in the test cases, N ¼ 6810 and Dm ¼ 0 of Case 4. Furthermore, the field-dependent line shape is computed at representative field values and at Zu ¼ 3663 eV, which is the strength-weighted center of all the lines in Case 1. Table 3 displays the field values and speed up relative to the conventional method. The field is given in units often used in line shape calculations,
bo ¼ e=a2e
(44)
Table 3 Speed up of iterative QMR method at selected values of the Stark field.
b
3/ o
Speed up
1 5 10 15 20
14.3 14.3 13.6 12.5 2.5
C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399
with ae the electron-sphere radius defined by
4pne a3e =3
¼ 1
X
gð3 Þ ¼
3
2n
(A.2)
n¼0
(45)
The results show that the speed up degrades with increasing field, which is a consequence of the relative increase in the offdiagonal contributions to R1 ðu; 3 Þ and the matrix condition number. Eventually, the condition number increases sufficiently that it is necessary to improve the pre-conditioner to obtain convergence. This was the case for the largest field value in Table 3. A comparison with the results in Table 2 shows that the QMR iterative solver is significantly faster than the conventional method. As presently used, however, the matrix sparsity in the present calculations makes the QMR method not as efficient as the modelreduction approaches. Finally, iterative solver methods are not guaranteed to converge so that contingencies have to be implemented to account for such an event. Nevertheless, iterative solvers are an import class of methods and further experimentation with a variety of pre-conditioners may prove valuable.
[T $C 2n A1 r
397
where the odd power Taylor coefficients vanish. The symmetry of g(3 ) requires that the C matrix and the vectors [ and A1 r have special forms [26]. For the Stark-profile problem the condition on the Taylor coefficient is satisfied only if there exists a block structure such that a permutation transformation P satisfying PP T ¼ P T P ¼ I, leads to
P[ ¼
[1 ; 0
PA1 r ¼
PCP T ¼
(A.3)
x1 ; 0
0 C2
(A.4)
C1 0
(A.5)
Then it follows from Eq. (A.5) that
8. Conclusion
The efficiency of various algorithms to compute Stark-profiles in the quasi-static ion approximation was compared. It is emphasized that the methods in the comparison do not rely on physical approximations beyond those present in the standard theory. The conventional approach, which uses explicit matrix inversion, was the least efficient of the algorithms tested. The most efficient were the model-reduction methods that provided more than 2 orders of magnitude faster execution than the conventional approach for large-scale calculations. Although there are numerically robust realizations of the Lanczos process, the Arnoldi process with Householder orthogonalization is more easily implemented and highly stable. Often, computation times can depend on the optimization of library routines, but the speed up of the model-reduction methods is not strongly dependent on the efficiency of the libraries. Specifically, the main operation count in both the conventional and model-reduction methods using unoptimized routines is the factorization of a complex, symmetric matrix respectively requiring OðN3 N3 Þ and OðN 3 =3Þ operations to compute the line profile (here N is the matrix size). Consequently, the speed up is w3N3 , which is in reasonable agreement with the present results using optimized routines. It is expected that the highly efficient model-reduction methods should make future large-scale quasi-static ion line profile calculations practical without having to introduce uncontrolled approximations.
PC 2n P T ¼
0 U2
U1 0
and
PC 2nþ1 P T ¼
V1 0
0 V2
(A.6)
and guarantees the vanishing of the odd Taylor coefficients in Eq. (A.2). Now assume A is diagonal. Since AC ¼ B it follows B must have the same structure as C,
PBP T ¼
0 B2
B1 0
(A.7)
In turn, the structure of B is independent of approximations to A. So for the product AC to produce the required structure in B, the permutation must also lead to
PAP T ¼
A1 0
0 A2
(A.8)
which is consistent with the requirement in Eq. (A.4) (recall that the vectors [ and r are simply related by the occupation number). Even though the resolvent R is in block diagonal form, the A matrix can be reduced further to at least 2 diagonal blocks. Although not implemented at present, the structure of these matrices can save a factor of 2 in storage and accelerate the factorization of A required in some algorithms. Finally, note that the present arguments do not apply for the model-reductions schemes using Ao and Co in Eqs. (30) and (32).
Acknowledgments We thank Howard A. Scott for the atomic data. This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC5207NA27344.
Using symmetry arguments it follows that selecting the z-axis for the radiator internal states to be along or against the field direction must yield identical results. These two choices yield
Consequently, the Taylor series expansion has the form
The Arnoldi process is similar to the Lanczos algorithm in that it constructs an orthogonal basis for the Krylov space,
n o Kk ðCo ; bÞ ¼ span b; Co b; Co2 b; .; Cok1 b
Appendix A. Special structure of A, B, and C matrices
gð3 Þ ¼ [T $fI 3 Cg1 A1 r ¼ [T $fI þ 3 Cg1 A1 r ¼ gð3 Þ
Appendix B. Arnoldi process with Householder stabilization
(B.1)
For the N N matrix Co, performing k of the classical Arnoldi iterations returns a set of k vectors as the columns of the N k rectangular matrix Qk and the k k Hessenberg (tridiagonal plus upper triangular) matrix Hk. These quantities have the properties, [27]
(A.1) b ¼
1 bT $b 2 Qk e1
(B.2)
398
C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399
qTi $qj ¼ dij
or
Qky Qk ¼ I
(B.3)
B.2 Householder algorithm
(B.4)
The complex Householder algorithm [9] required by the Arnoldi process above is given below with minor modifications. Given a complex vector x of length n and an integer k such that 1 k n, form a scalar b and a vector w of length n so that the matrix
and
Co Qk ¼ Qk Hk þ Hk ðk þ 1; kÞqkþ1 eTk
where Hk ði; jÞ are the elements of Hk, qj is the jth Arnoldi vector, and eTj is the jth unit vector. The connection with the complex function
gð3 Þ ¼ [T $fI ð3 3 o ÞCo g1 b
(B.5)
is made as follows. It is readily shown using Eqs. (B.2)e(B.4) together with the knowledge that Hk is a Hessenberg matrix that for j < k, j
1 j bT $b 2 Co Qk e1 1 j ¼ bT $b 2 Qk Hk e1
Co b ¼
P ¼ I bwwT
(B.2.1)
has the property that the ith element of Px vanishes for i ¼ k þ 1; .; n. Algorithm 2.
1: Define xmax ¼ max½jxðiÞj; i ¼ k; .n 2: Ifðxmax ¼ 0:0Þ; then
(B.6)
3: b ¼ 0:0 4: wð1 : nÞ ¼ 0:0 5: return
Hence, the Taylor coefficients of g(3 ) are given by
mj ¼ ¼
j [T $Co b
1 j bT $b 2 [T Qk $Hk e1
6: else (B.7)
10: IfðjwðkÞj ¼ 0:0Þ; then (B.8)
11: wðkÞ ¼ a 12: else
which reproduces the first k 1 Taylor coefficients of g(3 ). These results are slightly modified when using the numerically stable Householder algorithm to generate orthogonal Arnoldi vector. The details are provided in the following sub-sections.
13: wðkÞ ¼
wðkÞ fa þ jwðkÞjg jwðkÞj
14: EndIf 15: EndIf The notation n ¼ i; i þ 1; .; j.
B.1 Arnoldi algorithm The Arnoldi algorithm generates the matrix Qk ¼ ½q1 q2 .qk with qj the Arnoldi vectors and the Hessenberg matrix Hk ¼ ½h1 h2 .hk with hj the columns. The orthogonalization is performed with the numerically robust Householder stabilization [28] leading to the following algorithm [27]. Algorithm 1.
x(i:j)
denotes
the
implied
Fortran
2: For j ¼ 1; .; k Do :
loop
B.3 Line shape expression Using the Householder algorithm to stabilize the Arnoldi iterations introduces minor modifications to the approximant line shape. Specifically, Eq. (B.8) is modified to read 1 bð1Þ T 2 T gk ð3 Þ ¼ b $b [ Qk $fIk 3 Hk g1 e1 jbð1Þj
1: Set x1 ¼ b
(B.2.4)
where b(1) is the first element of the vector b. Furthermore, requiring that
3: Compute the Householder unit vector wj and Pj from xj See Section B:2 4: hj1 ¼ Pj xj
(B.2.3)
1 9: b ¼ aða þ jwðkÞjÞ
Now by analogy with Eq. (B.5), write
gð3 Þzgk ð3 Þ 1 ¼ bT $b 2 [T Qk $fIk ð3 3 o ÞHk g1 e1
xðk : nÞ xmax qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 8: a ¼ wðkÞ þwðk þ 1Þ2 þ.wðnÞ2
7: wðk : nÞ ¼
bð1Þ T 2 b $b e1 Qky b ¼ jbð1Þj 1
(B.1.1)
5: qj ¼ P1 P2 .Pj ej 6: If j k compute xjþ1 ¼ Pj Pj1 .Co qj 7: EndDo 8: hk ¼ xkþ1 Note that the matrices Pj need not be formed explicitly. To obtain hj1 from xj in line 4, zero out all the components from position j þ 1 through n of the n-vector xj and change its jth component, leaving all others unchanged. Saad [27] gives a detailed discussion of the Arnoldi process and its implementation.
(B.2.5)
offers a check on the implementation of Algorithm 1 above. References [1] M. Baranger, in: D. Bates (Ed.), Atomic and Molecular Processes, Academic Press, New York, 1962. [2] H.R. Griem, Spectral Line Broadening by Plasmas. Academic, New York, 1974. [3] H. Margeneau, M. Lewis, Rev. Mod. Phys. 31 (1959) 569. [4] J. Holtsmark, Ann. Phys. (Leipzig) 7 (1919) 578. [5] L.A. Woltz, C.F. Hooper, Phys. Rev. 38 (1988) 4766. [6] R.C. Mancini, et al., Comput. Phys. Commun. 63 (1991) 314.
C.A. Iglesias, V. Sonnad / High Energy Density Physics 7 (2011) 391e399 [7] [8] [9] [10] [11] [12] [13] [14]
P.A. Loboda, et al., Laser Particle Phys. 18 (2000) 275. G.H. Golub, C.F. Loan, Matrix Computations. John Hopkins Press, 1996. A. Calisti, et al., Phys. Rev. A. 42 (1990) 5433. R.W. Lee, (private communication). R. Mancini, (private communication). C.A. Iglesias, V. Sonnad, HEDP 6 (2010) 399. V. Sonnad, C.A. Iglesias, HEDP 6 (2010) 391. R.D. Cowan, The Theory of Atomic Structure and Spectra. University of California Press, Berkeley, 1981. [15] E. Anderson, et al., LAPACK Users Guide, third ed. Society for Industrial & Applied Mathematics, 1999, ISBN 0898714478. [16] R.W. Freund, Nachtigal, Numer. Math. 60 (1991) 315. [17] R.W. Freund, N.M. Nachtigal, SIAM J. Sci. Compt 15 (1994) 313.
399
[18] R.W. Freund, N.M. Nachtigal, Fortran Distribution of QMR. Implementation: Copyright, 1991. [19] P. Feldman, R.W. Freund, IEEE Trans. Computer-Aided Des. Integrated Circuits Syst. 14 (1995) 639. [20] R.W. Freund, M.H. Gutknecht, N.M. Nachtigal, SIAM J. Sci. Compt. 14 (1993) 137. [21] I.E. Golovkin, R.C. Mancini, JQSRT 65 (2000) 273. [22] C.A. Iglesias, HEDP 6 (2010) 318. [23] C.A. Iglesias, et al., Phys. Rev A. 31 (1985) 1698. [24] C.A. Iglesias, et al., JQSRT 65 (2000) 303. [25] M.F. Gu, Can. J. Phys. 86 (2008) 675. [26] R.W. Freund, private communication. [27] Y. Saad, Iterative Methods for Sparse Linear Systems. PWS Publishing Co, 1996. [28] H.F. Walker, SIAM J. Sci. Compt. 9 (1988) 152.