Advances in Engineering Software 41 (2010) 1277–1286
Contents lists available at ScienceDirect
Advances in Engineering Software journal homepage: www.elsevier.com/locate/advengsoft
Efficient solution for Galerkin-based polynomial chaos expansion systems H.M. Panayirci ⇑ Chair of Engineering Mechanics, University of Innsbruck, Technikerstr. 13, A-6020 Innsbruck, Austria
a r t i c l e
i n f o
Article history: Received 18 September 2009 Accepted 5 September 2010 Available online 14 October 2010 Keywords: Stochastic finite elements Polynomial chaos expansion Computational efficiency Iterative solvers Preconditioners Uncertainty quantification
a b s t r a c t Iterative solvers and preconditioners are widely used for handling the linear system of equations arising from stochastic finite element method (SFEM) formulations, e.g. galerkin-based polynomial chaos (G-P-C) Expansion method. Especially, Preconditioned Conjugate Gradient (PCG) solver and the Incomplete Cholesky (IC) preconditioner are shown to be adequate choices within this context. In this study, approaches for the automated adjustment of the input parameters for these tools are to be introduced. The proposed algorithms aim to enable the use of the PCG solver and IC preconditioner in a black-box fashion. As a result, the requirement of the expertise for using these tools is removed to a certain extend. Furthermore, these algorithms can be used also for the implementation purposes of SFEM’s within general purpose software by increasing the ease of the use of these tools and hence leading to an improved user-comfort. Ó 2010 Elsevier Ltd. All rights reserved.
1. Introduction The stochastic finite element method (SFEM) is an important tool for modeling and propagating uncertainties in computational mechanics. In essence, the SFEM extends the finite element method (FEM) to account for uncertainties associated with the parameters of the structure modeled with the FEM. A variety of methods accomplishing this task have been proposed, extended and reviewed over the past decades [24–26]. However, to account for the structural uncertainties – especially when dealing with large and complex FE models – brings a significant additional computational cost. Hence, the requirement of computational efficiency has been recognized as a critical aspect in context with the SFEM already for some time. Polynomial chaos (P-C) expansion [11] is a widely used method within SFEM and has been applied to various types of problems [7,12,17,18,30]. If applied with a Galerkin-based approach, the analysis requires the solution of the linear system of equations, which is substantially larger than the size of the deterministic problem. The computational cost grows exponentially with the dimensionality, i.e. number of random parameters considered, and the order of the expansion. Hence, to improve the computational efficiency is an ongoing topic of interest. While some studies are aiming to reduce the dimensionality of the problem [7,19,23] within this context, others deal with the solution of the resulting system of linear equations using preconditioners and iterative solvers [9,10,20,29,6,21]. Concerning the solution of linear system of equations, there exists a vast literature covering numerous types ⇑ Tel.: +43 512 507 6841; fax: +43 512 507 2905. E-mail address:
[email protected] 0965-9978/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.advengsoft.2010.09.004
of preconditioners and iterative solvers (see e.g. [22,28] for extensive overviews), each with certain advantages for different type of problems. Considering the system of equations with symmetric positive definite (SPD) coefficient matrix, Preconditioned Conjugate Gradient (PCG) solver, which uses the Incomplete Cholesky (IC) factorization as preconditioner has been recognized as an appropriate choice [22,27,5]. On the other hand, it is well known that the use of the iterative solvers with preconditioners require expertise, especially considering that the input parameters affect the overall computational efficiency and accuracy significantly. For example, considering the IC preconditioner, the difficulty of selecting an adequate value for the drop tolerance for a given problem has been reported previously [14,15]. Similarly considering iterative solvers, the convergence tolerance is an important input parameter, which affects the accuracy of the solution substantially. Hence, the selection of appropriate values for these parameters is very important. It should be noted at this point that, depending on the selected IC and PCG implementations, the influence of their input parameters on the overall efficiency might show variations. In other words, by using more effective IC/PCG implementations, these tools can be more tolerant to variations of parameters, but the problem of automation in the choice of there parameters still holds. This paper will present approaches for the automated adjustment of these parameters. In particular, the proposed procedures aim to enable the use of the iterative PCG solver and drop tolerance based IC preconditioner in a black-box fashion. It should be noted that an improvement of the accuracy of the G-P-C method itself, is not within the scope of this paper. Furthermore, the proposed algorithms do not remove the basic limitations of the IC preconditioner or of the PCG solver, e.g. the IC factor of an SPD matrix is not
1278
H.M. Panayirci / Advances in Engineering Software 41 (2010) 1277–1286
guaranteed to exist or the PCG solver may not convergence to the desired solution. Here, the efforts focus on improving the user-comfort for these solution tools by removing the necessity of the identification of suitable drop tolerance and convergence tolerance values. The paper is structured as follows: First, a brief overview of the G-P-C method and the approach followed for the SFE analysis will be presented in Section 2. The computational aspects of the solution of the resulting system of equations is analysed subsequently in Section 3. In this section, the effect of the drop tolerance parameter (D) for the IC preconditioner and the convergence tolerance (r) for the PCG solver on the accuracy and the overall computational cost will be shown. In Section 4, novel approaches are introduced, which should serve to achieve efficiency and accuracy for the given deterministic and probabilistic model, without the agonizing input parameter selection process for the preconditioner or the iterative solver. The proposed approaches will be then tested in the numerical experiments part in Section 5. Finally the outcomes of the study will be summarized in the conclusions. 2. SFE analysis using Galerkin-based polynomial chaos expansion Considering linear static analysis within SFEM, the equilibrium equation is expressed as,
KðhÞuðhÞ ¼ f
ð1Þ
where K(h) denotes the stochastic stiffness matrix, u(h) is the stochastic displacement vector, f stands for the deterministic force vector and h represents the randomness (in the sequel h will be omitted to keep the notation concise). Here, the stiffness matrix can be expressed in the following form [11],
K¼
L X
Ki ni
ð2Þ
i¼0
In the above equation, K0 stands for the nominal (average) stiffness matrix, Ki are matrices with deterministic size n and ni represent uncorrelated standard normal random variables (RV’s) with zero mean and unit variance, except for i = 0, where n0 = 1 is deterministic. Using the P-C expansion in order to approximate the stochastic displacement vector u leads,
u¼
P X
uj wj ðnÞ
ð3Þ
j¼0
where uj are the n dimensional vector of deterministic coefficients to be determined and wj(n) are the Hermite polynomials, which form the basis of the approximation space. Substituting the expressions for K and u into Eq. (1) and adopting the Galerkin approach leads to the following system of linear equations, P X L X hni wj wk iKi uj ¼ hfwk i; j¼0
k ¼ 0; 1; . . . ; P
ð4Þ
i¼0
where h.i denotes the mathematical expectation operator and hniwjwki terms are deterministic quantities, which depend only on the dimension and order of the expansion. This above equation can be expressed in matrix form as,
2 6 6 6 6 6 6 4
Kð00Þ
:
...
: :
: ... : KðjkÞ
:
:
...
KðP0Þ
:
...
38 9 8 9 f0 > Kð0PÞ > > u0 > > > > > > > > > > > > 7> > > > : > > : > > > : 7< < = = 7 : 7 7 > uk > ¼ > f j > > > 7> > > : > > > > > : 5> > > : > > > > > : > ; > ; ðPPÞ : uP fP K
ð5Þ
where each K(jk) is defined as,
KðjkÞ ¼
L X hni wj wk iKi
ð6Þ
i¼0
It can be seen from Eq. (5) that the G-P-C formulation leads to an SFEM system of equations with size n(P + 1) n(P + 1), where P + 1 denotes the number of P-C terms for the expansion of u, which depends on the number of random parameters (L) and the order of the expansion (p). The relation between these terms is given as P + 1 = (L + p)!/(L!p!), reflecting the exponential growth of P with respect to p and L. At this point, it is important to mention some of the characteristics of the SFEM system resulting from Eq. (5) for clarity. First of all, it should be noted that since the force is assumed to be deterministic, hwkfi = 0 for k – 0. Furthermore, K0 appears only on the block-diagonal of the system, whereas Ki are to represent the random fluctuations (see appendix for numerical example). It should be noted that, assuming the variations of the random input parameters are small, the entries of K0 are much larger in magnitude compared to those of Ki, which leads to a block-diagonal dominance for K. This dominance makes the Block-Jacobi preconditioning an adequate choice for the considered SFE analysis [6]. In that case, as described in [20], a single incomplete factorization of K0 is sufficient in order to obtain the preconditioner for K, since the block-diagonal entries of K consists of the multiplication of K0 with the corresponding hniwjwki terms. For a better understanding of the sparsity structure of K, the SFEM system for a model with deterministic size 60 60 and containing five RV’s is shown in Fig. 1, where the growth of the size of the system with respect to the expansion order and the corresponding sparsity pattern can be observed. Since to construct the SFEM system of equations for large FE models becomes usually prohibitive due to memory limitations, a matrix-vector product approach introduced in [20] has been followed in this study, where the assembly of the n(P + 1) n(P + 1) size system is avoided and calculations are carried out over the deterministic size K(jk) matrices. In other words, the solution of the SFEM system is performed using the equation K(jk)uk = fj, which requires only the storage of the deterministic size Ki matrices and the corresponding hniwjwki coefficients according to Eq. (6) (see [20] for details).
3. Computational aspects of the solution of SFEM systems Before elaborating the computational aspects of the G-P-C analysis, some preliminary concepts about the iterative solution process will be introduced. Linear systems of equations are expressed generally in the form Ax = b, where A is the coefficient matrix, b is the known vector and x is the unknown vector to be solved. If the coefficient matrix A is SPD, then the iterative PCG solver is usually preferred for the solution. The convergence of the solver depends on the condition number of A, which is defined for SPD systems as the ratio of the maximum to the minimum eigenvalue of A, i.e. Cond(A) = kmax/kmin. In other words, the convergence depends strongly on the distribution of the eigenvalues of A, i.e. as the condition number increases, the convergence rate of PCG solver decreases. Preconditioners are used within this context in order to reduce the number of iterations required by the iterative solvers. The preconditioner P for A is a matrix such that P1A has a smaller condition number than A, i.e. the convergence of the PCG solver is accelerated. That is, instead of solving Ax = b, the system P1Ax = P1b is solved. IC factorization comes into the play at this point, namely is used to obtain the preconditioner.
H.M. Panayirci / Advances in Engineering Software 41 (2010) 1277–1286
1279
Fig. 1. Sparsity pattern of K.
Since the greatest portion of the CPU time required by the SFEM calculations is typically spent performing matrix factorizations or iterative solution of linear equation systems, the efficiency of the PCG solver and IC preconditioner play a significant role in the overall computational cost [5]. In this section, the effect of the input parameters for these, i.e. the drop tolerance and convergence tolerance values, on the CPU time will be shown. Some selected FE models (see Fig. 2) built with MSC.Patran [3] will be used for this purpose. These FE models have been chosen in order to have a variety of models with various sizes and element types. For each model, the structure has been completely fixed at certain locations and is statically loaded in order to obtain a meaningful solution. Then, the displacement at an arbitrary node is selected as the response and its second order statistics have been estimated using the G-P-C expansion formulation. It should be noted that the boundary conditions, the loading and the selected response are irrelevant considering the focus of this study. Moreover, for all the numerical examples considered,
only the uncertainties in the Young’s modulus are taken into account, which are modeled as random variables with Gaussian (truncated) distributions. The variation of the random parameters are reported via their corresponding Coefficient of Variation (CoV). Concerning the calculations presented in this study, MSC.Nastran [2] has been used as the FE solver, whereas the SFE analysis has been performed with COSSAN [1], which is a general purpose reliability software developed in MATLAB environment. The PCG solver (MATLAB) has been selected as the iterative solver, where the IC factorization has been performed using the CHOLINC [13] function (MATLAB). Furthermore, in order to reduce the bandwidth of the coefficient matrix, reordering procedures have been applied. It is known that by reducing the bandwidth of the coefficient matrix, the computational cost and storage requirement for the incomplete factorization can be improved significantly. Reverse Cuthill-McKee [4] for example, has been pointed out as an adequate reordering strategy within this context for Conjugate Gradient iterations preconditioned by means of incomplete
1280
H.M. Panayirci / Advances in Engineering Software 41 (2010) 1277–1286
Fig. 2. (a) Satellite Model (5268 DOF’s – 2D elements), (b) Blade Model (12,573 DOF’s – 3D elements), (c) Building Model (53,123 DOF’s – 1D, 2D and 3D elements).
1
10
factorization [8] and hence has been applied in the numerical examples as well.
D=1e−1 D=1e−2 D=1e−3 D=1e−4 D=1e−5 D=1e−6
0
10
3.1. Effect of drop tolerance
−1
In this section, the change in the overall computational efficiency with respect to the drop tolerance is investigated. For each FE model, the following drop tolerance values are used for the IC factorization: D = 101, 102, 103, 104, 105, 106. The associated CPU times needed for the factorization and for the PCG solver to reach a certain residual norm level are measured. These are reported in the corresponding plots (see Figs. 3–5) as t, where the first value refers to the IC factorization time and the second value refers to the CPU time required by the PCG solver. Furthermore, the memory allocation needed to store the resulting preconditioners are also reported as M in the plots. Following observations are made in these experiments: It is shown that the drop tolerance affects the convergence of PCG solver and the overall computational cost significantly.
Residual Norm
10
−3
10
−4
10
−5
10
−6
10
50
100
150
200
Fig. 4. Convergence of the PCG solver for the SFEM analysis of the Blade Model (L = 3, CoV = 0.10, p = 2).
1
1
10 D=1e−1 D=1e−2 D=1e−3 D=1e−4 D=1e−5 D=1e−6
0
10
−1
−2
10
−3
10
D=1e−1 D=1e−2 D=1e−3 D=1e−4 D=1e−5 D=1e−6
0
10
−1
10
Residual Norm
10
−2
10
−3
10
−4
−4
10
−5
10
10
−5
10
−6
−6
10
0
Iterations
10
Residual Norm
−2
10
0
50
100
150
200
Iterations Fig. 3. Convergence of the PCG solver for the SFEM analysis of the Satellite Model (L = 10, CoV = 0.10, p = 2).
10
0
50
100
150
200
Iterations Fig. 5. Convergence of the PCG solver for the SFEM analysis of the Building Model (L = 5, CoV = 0.10, p = 1).
1281
H.M. Panayirci / Advances in Engineering Software 41 (2010) 1277–1286
As the drop tolerance decreases, the quality and the memory allocation of the resulting preconditioner increases. A higher quality preconditioner leads to less iterations in the PCG solver for a given convergence tolerance. However, this does not necessarily result in less computational time, since both the time required for the factorization and the cost of each iteration increases. Thus, there is an optimum drop tolerance value among the considered ones, for which the convergence is achieved and the overall CPU time is the minimum for a given system. These results agree with the similar experiments presented in the literature (see e.g. [16]). These optimum drop tolerance values for the IC preconditioners, which provide convergence within the shortest amount of CPU times change with respect to the considered model. For example considering the satellite model, the IC Preconditioners obtained with D = 105 and D = 106 perform much better than the remaining ones, whereas D = 106 and D = 104 provide the best overall efficiency for the blade and building models, respectively. These observations lead to an obvious open issue: ‘‘How to select an adequate drop tolerance for a given problem?”. In the next chapter, an approach to overcome this difficulty will be introduced. 3.2. Effect of convergence tolerance Convergence tolerance, as well as the drop tolerance, plays also a significant role in the computational efficiency. A high tolerance value might lead to inaccurate response statistics, whereas a low tolerance value can lead to excessive CPU times. This fact leads to another open problem: ‘‘How to select the convergence tolerance value for the PCG solver, which will provide the accurate estimation (maximum being only as accurate as the G-P-C approximation allows) of the response statistics within the optimum CPU time ?”. To explain this problem more clearly, consider the building model for which the CPU times required for the convergence of the PCG are reported in Table 1. The following convergence criterion has been used for the analysis,
rP
kb Axk kbk
ð7Þ
where k k refers to the norm of the vectors. It can be observed that, selecting the convergence tolerance higher than r = 104 leads to gross errors in the CoV estimation of the response, whereas a value of r = 106 leads to excessive CPU time compared to r = 105, which provides convergence already with respect to the response statistics. In other words, the additional amount of time spent for the run with r = 106 turns out to be not necessary in terms of estimation of the CoV. The main difficulty here is that the optimum value of r depends on the given problem and cannot be predicted beforehand. Hence, an algorithm to adjust the tolerance value automatically is required in order obtain the accurate results in an optimum amount of CPU time. This issue will be addressed in the next chapter.
Table 1 Summary of the SFE analysis for the Building Model (L = 5, CoV = 0.20, p = 2). Convergence threshold
PCG solver time (s)
Mean value
CoV value
Building Model 101 102 103 104 105 106
41 63 236 631 1062 1237
0.002 0.002 0.002 0.002 0.002 0.002
0.085 0.114 0.367 0.232 0.228 0.228
4. Automated adjustment of the solution input parameters In this part, novel approaches in order to overcome the difficulties stated in the previous chapter, will be introduced. As mentioned above, assuming that G-P-C method is chosen for the SFE analysis and the iterative PCG solver with the IC preconditioner is used for the solution of the resulting system of equations, two main issues arise: 1. Determination of an adequate drop tolerance value for the IC preconditioner. 2. Determination of the convergence tolerance value for the PCG solver, which leads to the accurate estimation of the response statistics. 4.1. Nominal solution based selection criteria considering modified IC preconditioners The algorithm proposed in this section suggests selecting the IC preconditioner for a given problem, based on its ‘‘nominal solution performance”. That is, in order to determine an appropriate preconditioner, first the nominal system, i.e. K0u0 = f, is solved with the PCG solver using IC preconditioners obtained with various D values and the corresponding CPU times are recorded. Then, the preconditioner, which provides convergence within the minimum CPU time, is used for the solution of the SFEM system. The advantage introduced with this approach is that the CPU time required for the solution of the nominal system is significantly less than the one required for the large SFEM system. Hence, a measure about the performances of the candidate preconditioners can be obtained for a negligible computational cost. Furthermore, it is reasonable to expect a correlation between the efficiency of these two analysis, considering the fact that the solution process for the SFEM system consists of the solution of deterministic size matrices (see Eq. (6)), which vary only slightly from the K0, according to the approach followed in [20]. In order to validate the suggested idea, CPU times for the nominal solution of the test models are reported in Table 2. It can be observed that the variation of the measured computation times w.r.t D show the same behavior as seen in Figs. 3–5. In other words, the optimum D value, which provides the convergence for the nominal analysis within the shortest CPU time also provides the solution for the overall SFEM system with the best CPU time. Therefore, in this study it is proposed to select the IC preconditioner based on the performance in the nominal solution, before proceeding to the solution of the more expensive SFEM system. The disadvantage of the above presented idea is that it requires the IC factorizations of the nominal stiffness matrix with various D values for a considered range, to measure the performances. In order to avoid this drawback, the proposed approach is modified with the following algorithm: Step 1: According to a predetermined drop tolerance range, e.g. [106, 101], construct the preconditioners with the lowest, i.e. D = 106 (Pl), and highest, i.e. D = 101 (Ph), drop tolerance values. Step 2: Identify the indices of the different nonzero terms of Pl and Ph. Table 2 Total CPU times (s) for the nominal solution using PCG solver. Model
D = 103
D = 104
D = 105
D = 106
Satellite Blade Building
0.723 0.689 0.400
0.139 0.869 0.387
0.071 0.848 0.444
0.058 0.337 0.918
1282
H.M. Panayirci / Advances in Engineering Software 41 (2010) 1277–1286
Step 3: Store these additional nonzero terms in an array and sort them according to their absolute magnitude. Step 4: Construct three additional intermediate quality preconditioners by removing 25% (P25%), 50% (P50%) and 75% (P75%) of these different nonzero terms based on their absolute magnitude from Pl respectively. Step 5: Perform nominal solution (K0u0 = f) with each of the five candidate preconditioners (Pl, P25%, P50%, P75%, Ph) and record the corresponding CPU times Step 6: Select the preconditioner, which provides convergence within the shortest CPU time for the nominal solution, to be used in the solution of the large SFEM system. The motivation behind this algorithm, which is to be referred as Drop Tolerance Range based IC (DTR-IC) from now on, is to obtain the candidate preconditioners, among which an adequate one will be chosen according to its nominal solution performance, in a more efficient way. This has been shown in Table 3, where the CPU times required to prepare the candidate preconditioners with two different approaches are summarized. The first column refers to the CPU times, where five preconditioners have been obtained by performing the IC factorization with five different D values, i.e. D = 103, D = 104, D = 105, D = 106, D = 107. The second column reports the CPU times needed by the DTR-IC, which performs the IC factorization only for the limit values for the same drop tolerance range, i.e. D = [107, 103], and obtains the three intermediate quality preconditioners by removing some of the additional nonzero terms based on their absolute value. Table 3 demonstrates that the proposed approach provides the same number of candidate preconditioners in a shorter time. Most importantly, it will be also shown in the numerical examples section that these intermediate preconditioners obtained with the DTR-IC result in a better computational efficiency within the PCG solver – in addition to the reduction in the memory allocation requirements – compared with the user defined D values. It should be also noted that similar approaches to the one presented above exist in the literature, e.g. Jones and Plassmann [15] proposed a prescribed memory strategy for the IC preconditioner, where the amount of additional fill-in is determined according to the initial number of nonzero terms in each row and column in the coefficient matrix. In this approach, the fill-in values are selected according to their absolute magnitude. Hence, the proposed DTR-IC algorithm can be seen as a combination of the drop tolerance and memory-based strategies.
4.2. Determination of the convergence tolerance based on response statistics The second important parameter concerning the efficiency and accuracy within the solution process is the convergence tolerance value (r) selected for the norm residual in the PCG solver. Since the main goal of the SFE analysis is to approximate the stochastic response accurately, it is proposed to adjust this tolerance value according to the convergence of the response statistics. The suggested Convergence Tolerance algorithm based on Response Statistics (CTRS) can be summarized as follows:
Table 3 Total CPU times required to prepare five candidate IC Preconditioners. FE model
IC (s)
DTR-IC (s)
Satellite Blade Building
9 76 800
5 38 539
Step 1: Set the convergence threshold r = 101 and execute the PCG solver with the selected optimum preconditioner. Step 2: Obtain the P-C coefficients uj in Eq. (4). Step 3: Calculate the mean and coefficient of variation (CoV) values of the response accordingly. Step 4: Decrease the threshold by a factor j, e.g. j = 1/10, and execute the PCG solver again, this time using the P-C coefficients calculated in the previous call as the initial guess. Step 5: Obtain the P-C coefficients and calculate again the second order statistics of the response. Step 6: Compare the calculated statistics with the ones calculated with the previous tolerance value. Step 7: If the variation in the calculated response statistics is below a certain threshold stop the iterations, otherwise return back to step 4. Following this approach, the response statistics can be calculated accurately without spending any excessive CPU time on the solver part, which will be shown in the numerical experiments section. In Fig. 6, the proposed algorithms: (i) selection of the DTR-IC preconditioner based on the nominal solution performance, (ii) determination of the convergence tolerance value for the PCG solver based on the convergence of the response statistics (CTRS), are summarized and described using a flowchart. At this point, a few remarks are to be made about the suggested algorithms for clarity: DCoV and Dl shown in the flowchart are defined as,
jCoV iþ1 CoV i j 100 CoV iþ1 jl li j Dl ¼ iþ1 100
DCoV ¼
ð8Þ
liþ1
where li and CoVi are the mean value and coefficient of variation of the response calculated after the ith execution of the PCG solver and m is the convergence parameter (in terms of %) concerning the response statistics. The selection of the drop tolerance range for the DTR-IC does not require any expertise. As will be shown in the numerical experiments section, D = [106, 101] provide good performance in general and hence can be used as default. However, different ranges can be also considered, depending on the hardware configuration, i.e. available memory, or the size of the considered deterministic FE model. Similarly, the convergence tolerance for the response statistics m is a much more meaningful – hence easy to select – quantity than the convergence tolerance of the PCG solver, considering that it represents the change in the response statistics in percentage. The candidate preconditioners, which fail to maintain symmetric positive definiteness are eliminated in the selection process.
5. Numerical experiments In this section, the computational efficiency of the proposed CTRS and DTR-IC algorithms will be tested with the selected FE models. The purpose is to show that the presented algorithms allow to perform SFE analysis efficiently and accurately without selecting any input parameters, i.e. the drop tolerance for IC factorization and convergence tolerance for PCG solver. It is stressed that the selection of adequate values for these parameters is otherwise very difficult, since the optimum values of these parameters cannot be estimated beforehand and they vary according to the given
1283
H.M. Panayirci / Advances in Engineering Software 41 (2010) 1277–1286
Fig. 6. Flowchart for the suggested algorithms.
problem. The proposed methods, however, remove this drawback and provide a user-comfort for the use of these tools. To test the efficiency of the CTRS algorithm, the response statistics of an arbitrarily selected node have been considered for each
FE model using a second order expansion (p = 2) and using various convergence tolerance values. These have been then compared to the response statistics obtained with the CTRS algorithm using m = 5%. Note that for each FE model, the CoV of the input RV’s
Table 4 Summary of response statistics and corresponding solver CPU times. Model
r = 101
r = 102
r = 103
r = 104
r = 105
r = 106
CTRS
Satellite (L = 10)
Mean CoV Time (s)
3.38 0.190 26
3.38 0.186 34
3.38 0.187 49
3.38 0.187 65
– – –
– – –
3.38 0.186 36
Blade (L = 3)
Mean CoV Time (s)
0.03 0.231 14
0.03 0.233 26
0.03 0.233 32
– – –
– – –
– – –
0.03 0.233 28
Building (L = 5)
Mean CoV Time (s)
0.002 0.085 41
0.002 0.114 63
0.002 0.367 236
0.002 0.232 631
0.002 0.227 1062
0.002 0.227 1237
0.002 0.227 1082
Fig. 7. (a) Wing edge model (124,920 DOF’s – 2D elements), (b) FIAT car model (328,434 DOF’s – 1D and 2D elements – Courtesy of Centro Ricerche FIAT).
3255 (603.8 MB – P75%) 6209 (295 iter) 9464 4647 (3.92 GB) 962 (9 iter) 5609 2704 (2.34 GB) 13,563 (215 iter) 16,267 1294 (1.20 GB) No conv – Fails: Symmetric positive definiteness of the matrix is lost during the incomplete factorization. Ill cond: The IC preconditioner is ill-conditioned. No conv: No convergence could be achieved (for r = 101, N = 500).
519 (617 MB) No conv – Fails – – Precon time (s) Solver time (s) Total time (s) Car (L = 1, p = 2, CoV = 0.10)
Ill cond – –
922 (261.3 MB – P75%) 5649 (136 iter) 6571 1074 (1.51 GB) 3165 (16 iter) 4239 688 (1.01 GB) 10,961 (83 iter) 10,649 303 (483.9 MB) 35,714 (532 iter) 36,017 79 (176.6 MB) No conv – Fails – – Precon time (s) Solver time (s) Total time (s) Wing (L = 4, p = 2, CoV = 0.10)
13 (77.1 MB) No conv –
284 (63.9 MB – P75%) 156 (10 iter) 440 469 (437.5 MB) 268 (3 iter) 737 235 (245.3 MB) 464 (9 iter) 699 84 (121.0 MB) 655 (26 iter) 739 27 (55.2 MB) 845 (59 iter) 872 Precon time (s) Solver time (s) Total time (s) Building (L = 5, p = 2, CoV = 0.10)
Fails – –
Precon time (s) Solver time (s) Total time (s) Blade (L = 3, p = 4, CoV = 0.20)
5 (24.4 MB) 1547 (217 iter) 1552
35 (39.1 MB – P50%) 143 (11 iter) 178 27 (84.2 MB) 163 (6 iter) 190 18 (77.5 MB) 141 (6 iter) 159 14 (68.1 MB) 7393 (400 iter) 7407 7 (34.1 MB) 5801 (514 iter) 5808 3 (12.8 MB) 1128 (205 iter) 1131
DTR-IC
Fails – – Precon time (s) Solver time (s) Total time (s) Satellite (L = 10, p = 3, CoV = 0.10)
10 (4.1 MB) No conv –
3 (21.1 MB) 200 (3 iter) 203 3 (18.1 MB) 153 (3 iter) 156
D = 107 D = 106 D = 105 D = 104 D = 103 No fill Model
Considering the satellite and building models, the DTR-IC preconditioners obtained with the proposed algorithm performed better than all the other IC preconditioners. For the blade model, the IC preconditioner calculated with D = 106 and D = 107 show slightly better performance than the DTR-IC preconditioner. However, it could not be predicted beforehand that these IC preconditioners would provide the best efficiency. Choosing for example D = 104 or D = 105 for this model, leads to very high overall CPU times. Furthermore, the memory allocation of the DTR-IC preconditioner is significantly
Table 5 Summary of preconditioning and solver CPU times for the solution of the SFEM systems.
Next, the efficiency of the suggested preconditioning algorithm will be investigated. It should be pointed out that besides the FE models introduced previously, two additional FE models (see Fig. 7) have been used for the testing in this part to investigate the performance of the proposed DTR-IC preconditioner for large FE models (e.g. >100,000 DOF’s). In Table 5, the CPU times required for the SFE analysis are reported considering various stochastic FE models. The CPU times required with the proposed DTR-IC algorithm are reported in the last column, whereas the CPU times in the remaining columns refer to the cases, where the IC preconditioners are obtained via user defined D values. For the DTR-IC algorithm, the reported preconditioner times cover the whole computational part for the construction and selection of the optimum preconditioner, i.e. preparation of the five different candidate preconditioners, performing the solution for the nominal system and selecting the optimum preconditioner according to their efficiency. Here the range D = [106, 101] provided good efficiency for all the tested FE models and hence is selected as the range in all the experiments. Moreover, the corresponding memory allocations required to store the selected preconditioners are also reported. For the DTR-IC algorithm, the selected preconditioners are provided as well. For all the cases considered, the convergence tolerance for the PCG solver have been determined automatically with the CTRS algorithm. For a fair comparison, same settings, i.e. m = 5% and maximum number of iterations N = 500 (for each call to PCG at each tolerance level), have been used in all analysis. Based on the results of the performed experiments, following observations are made:
2 (13.6 MB) 163 (4 iter) 165
The convergence tolerance value, which leads to the accurate estimation of the response statistics without any excessive computational cost, changes with respect to the given model. For example, considering the satellite and blade models, r = 102 already provides accurate estimation of response statistics, whereas this tolerance value does not provide the solution with acceptable accuracy for the building model. Since the CTRS algorithm calculates the response statistics by gradually decreasing the tolerance value, it provides accurate estimations for all tested models. It can be seen that the CPU times of the CTRS algorithm are only slightly higher than the ones corresponding to tolerance values with same accuracy. In other words, the computational cost for the PCG solver using the CTRS algorithm is almost the same as for the PCG solver executed with the final tolerance value detected by the CTRS algorithm. This is mainly due to: (i) the calculated P-C coefficients in the previous step are provided as initial guess at the next call to PCG solver, (ii) the cost of calculating the response statistics and checking the convergence is negligible compared to the overall solver time.
1 (8.6 MB) 2258 (103 iter) 2259
are assumed as 0.20 and only a single preconditioner has been used. According to the results summarized in Table 4, it can be concluded that:
8 (9.3 MB – P50%) 141 (4 iter) 149
H.M. Panayirci / Advances in Engineering Software 41 (2010) 1277–1286
1 (3.3 MB) 2027 (152 iter) 2028
1284
H.M. Panayirci / Advances in Engineering Software 41 (2010) 1277–1286
less than the regular IC preconditioners with comparable efficiency. Considering the wing and car models, only the IC preconditioners obtained with D = 107 outperform the DTR-IC preconditioner. However, once again the memory allocation required for this IC preconditioner is much larger than the DTR-IC preconditioner, which might be prohibitive for moderate hardware configurations, e.g. the IC preconditioner obtained with D = 107 for the car model requires 3.92 GB memory allocation. Although the proposed algorithm requires more time for the preparation of the preconditioner, this time loss is reclaimed during the time within the iterative solver. This is due to the fact that, the proposed approach selects an adequate preconditioner before proceeding blindly to the solution of the overall SFEM system. In other words, the preconditioners leading to substantially higher CPU time in the PCG solver are eliminated during the selection process. This can be clearly observed, considering for example the building model. In this case, the preconditioner obtained with D = 103 requires almost double amount of CPU time in the PCG solver compared to the other preconditioners. A similar case can be also seen with the preconditioner obtained with D = 105 for the wing model. Another observation made in these experiments is that the DTR-IC preconditioners provided convergence for all the tested FE models, whereas this was not the case for all the IC preconditioners. This shows that the proposed algorithm not only removes the difficulty of selecting an adequate drop tolerance value, but also provides better preconditioners than the regular drop tolerance based IC preconditioners for certain cases. 6. Conclusions In this study, approaches to use the iterative PCG solver with the IC preconditioner in a black-box fashion are presented for the solution of linear SFEM systems arising from G-P-C formulations. It is shown that the input parameters required by the solver and preconditioner have significant effect on the efficiency and accuracy of the analysis. The main difficulty arises from the fact that the appropriate values for these parameters change with respect to the considered problem. This drawback affects the applicability of the SFE analysis using G-P-C, considering for example the implementation aspects in a general purpose software. Therefore the proposed algorithms aim to automatically adjust these input parameters, for any given deterministic and probabilistic model. As a result, the requirement of the expertise in order to utilize these tools should be reduced to a certain extend. The novel contributions of this study and the corresponding advantages and limitations of the proposed algorithms can be summarized as follows: Considering SFE analysis based on G-P-C method, an algorithm to determine adequate convergence tolerance values for the PCG solver is presented. The algorithm is based on the convergence of the response statistics. It is shown that the proposed approach provides accurate estimation of the response statistics with efficient computational cost. An algorithm, which chooses the IC preconditioner according to the nominal solution performance, is introduced. For this purpose, the algorithm uses drop tolerance based IC factorization in a slightly modified way, that is using a predefined range of drop tolerance instead of a single drop tolerance value. A new preconditioner, referred as DTR-IC, is suggested within this context. The presented algorithm using the DTR-IC preconditioner is tested using five different FE models with different characteristics, i.e. size, element type, probabilistic model, etc. It is shown that the proposed approach provided very promising results in
1285
these experiments. Most importantly, the proposed algorithm can be used in black-box fashion, since the efficiency of the suggested algorithm is not sensitive to its only input parameter, namely the drop tolerance range. It should be noted that the proposed approaches do not remove the already existing drawbacks or limitations of the selected preconditioner, iterative solver or the G-P-C method itself. Instead, these algorithms aim to improve the user-comfort of the use of these tools, which are utilized at the solution stage of the system of equations arising from G-P-C formulation. These approaches should especially be useful for the implementation purposes of this method in general purpose software, since they remove the requirement of expertise for the use of these tools. Acknowledgements This research was partially supported by the Austrian Research Council (FWF) under Project No. L269-N13, which is gratefully acknowledged by the author. The author would also like to thank Prof. G.I. Schuëller for his guidance throughout this study. Furthermore, the valuable discussions with Dr. H.J. Pradlwarter and Dr. M.F. Pellissetti were most valuable. Finally, the help of Mr. Vigé (Centro Ricerche FIAT) for providing the FE model of the FIAT car is deeply appreciated. References [1] Institute of Engineering Mechanics, University of Innsbruck, Innsbruck, Austria, EU. COSSAN (COmputational Stochastic Structural ANalysis) – User’s Manual; 1996–2009. [2] MSC.Nastran, Version 2008, MSC.Software Corporation, Santa Ana, CA – USA; 2008. [3] MSC.Patran, Version 2008, MSC.Software Corporation, Santa Ana, CA – USA; 2008. [4] Alan G, Liu J. Computer solution of large sparse positive definite systems. Prentice-Hall; 1981. [5] Charmpis D. Incomplete factorization preconditioners for the iterative solution of stochastic finite element equations. Comput Struct 2010;88(3–4):178–88. [6] Chung D, Gutiérrez M, Graham-Brady L, Lingen F. Efficient numerical strategies for spectral stochastic finite element models. Int J Numer Methods Eng 2005;64(10):1334–49. [7] Doostan A, Ghanem R, Red-Horse J. Stochastic model reduction for chaos representations. Comput Methods Appl Mech Eng 2007;196(37–40):3951–66. [8] Duff IS, Meurant GA. The effect of ordering on preconditioned conjugate gradients. BIT 1989;29(4):635–57. [9] Elman HC, Ernst OG, O’Leary DP, Stewart M. Efficient iterative algorithms for the stochastic finite element method with application to acoustic scattering. Comput Methods Appl Mech Eng 2005;194(9–11):1037–55. [10] Ghanem R, Kruger R. Numerical solution of spectral stochastic finite element systems. Comput Methods Appl Mech Eng 1996;129(3):289–303. [11] Ghanem R, Spanos P. Stochastic finite elements: a spectral approach. New York: Springer; 1991. [12] Ghanem RG, Doostan A. On the construction and analysis of stochastic models: characterization and propagation of the errors associated with limited data. J Comput Phys 2006;217(1):63–81. [13] Gilbert J, Moler C, Schreiber R. Sparse matrices in matlab: design and implementation. SIAM Matrix Anal Appl 1992;13(1):333–56. [14] jen Lin C, More JJ. Incomplete cholesky factorizations with limited memory. SIAM J Sci Comput 1999;21:24–45. [15] Jones MT, Plassmann PE. An improved incomplete cholesky factorization. ACM Trans Math Softw 1995;21(1):5–17. [16] Kilic SA, Saied F, Sameh A. Efficient iterative solvers for structural dynamics problems. Comput Struct 2004;82(28):2363–75. [17] Le Maitre OP, Knio OM, Najm HN, Ghanem RG. A stochastic projection method for fluid flow: I. Basic formulation. J Comput Phys 2001;173(2):481–511. [18] Le Maitre OP, Reagan MT, Najm HN, Ghanem RG, Knio OM. A stochastic projection method for fluid flow: II. Random process. J Comput Phys 2002;181(1):9–44. [19] Nair B, Keane A. Stochastic reduced basis methods. AIAA J 2002;40(8):1653–64. [20] Pellissetti M, Ghanem R. Iterative solution of systems of linear equations arising in the context of stochastic finite elements. Adv Eng Softw 2000;31(89):607–16. [21] Powell CE, Elman HC. Block-diagonal preconditioning for spectral stochastic finite-element systems. IMA J Numer Anal 2008. drn014. [22] Saad Y, van der Vorst HA. Iterative solution of linear systems in the 20th century. J Comput Appl Math 2000;123(1–2):1–33.
1286
H.M. Panayirci / Advances in Engineering Software 41 (2010) 1277–1286
[23] Sachdeva S, Nair P, Keane A. Hybridization of stochastic reduced basis methods with polynomial chaos expansions. Probabilist Eng Mech 2006;21(2):182–92. [24] Schuëller GI, editor. A state-of-the-art report on computational stochastic mechanics. Probabilist Eng Mech 1997;12(4):197–321. [25] Stefanou G. The stochastic finite element method: past, present and future. Comput Methods Appl Mech Eng 2009;198(9–12):1031–51. [26] Sudret B, Der Kiureghian A. Stochastic finite element methods and reliability a state-of-the-art report. Tech. rep., Department of Civil and Environmental Engineering University of California, Berkeley; 2000.
[27] Tuma M, Benzi M. A comparative study of sparse approximate inverse preconditioners. Appl Numer Math 1999;30:305–40. [28] van der Vorst HA. Efficient and reliable iterative methods for linear systems. J Comput Appl Math 2002;149(1):251–65. [29] Verhoosel CV, Gutiérrez MA, Hulshoff SJ. Iterative solution of the random eigenvalue problem with application to spectral stochastic finite element systems. Int J Numer Methods Eng 2006;68(4):401–24. [30] Xiu D, Karniadakis G. Modeling uncertainty in flow simulations via generalized polynomial chaos. J Comput Phys 2003;187(1):137–67.