Accepted Manuscript
An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection Jin Meng , Heng Li PII: DOI: Reference:
S0309-1708(16)30625-X 10.1016/j.advwatres.2017.04.019 ADWR 2834
To appear in:
Advances in Water Resources
Received date: Revised date: Accepted date:
7 November 2016 26 February 2017 25 April 2017
Please cite this article as: Jin Meng , Heng Li , An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection, Advances in Water Resources (2017), doi: 10.1016/j.advwatres.2017.04.019
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
M
AN US
CR IP T
Highlights An efficient non-intrusive stochastic approach for flow in random porous media A small number of samples used to construct sparse polynomial chaos expansion by feature selection The algorithm is self-adaptive by adopting Inherited samples and cross validation (CV) error analysis
ACCEPTED MANUSCRIPT
An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection Jin Meng and Heng Li Department of Energy and Resources Engineering, College of Engineering,
CR IP T
Peking University, Beijing 100871, China Abstract
An efficient method for uncertainty quantification for flow in porous media is studied in this
AN US
paper, where response surface of sparse polynomial chaos expansion (PCE) is constructed with the aid of feature selection method. The number of basis functions in PCE grows exponentially as the random dimensionality increases, which makes the computational cost unaffordable in
M
high-dimensional problems. In this study, a feature selection method is introduced to select major stochastic features for the PCE by running a limited number of simulations, and the resultant PCE
ED
is termed as sparse PCE. Specifically, the least absolute shrinkage and selection operator modified
PT
least angle regression algorithm (LASSO-LAR) is applied for feature selection and the selected features are assessed by cross-validation (CV). Besides, inherited samples are utilized to make the
CE
algorithm self-adaptive. In this study, we test the performance of sparse PCE for uncertainty
AC
quantification for flow in heterogeneous media with different spatial variability. The statistical moments and probability density function of the output random field are accurately estimated through the sparse PCE, meanwhile the computational efforts are greatly reduced compared to the Monte Carlo method. Keywords:Uncertainty quantification; Karhunen-Loève expansion; sparse polynomial chaos expansion; feature selection; LASSO-LAR
ACCEPTED MANUSCRIPT
1 Introduction Uncertainty quantification (UQ) of subsurface flow in porous media has received much attention in recent years. Although geological formations are intrinsically deterministic, some formation properties are considered to be stochastic owing to high degree of heterogeneity and
CR IP T
incomplete knowledge of the formations. Therefore the prediction of subsurface flow often involves some degree of uncertainty. A stochastic description of formation parameters are necessary for UQ, which leads to stochastic partial differential equations (SPDE) governing flow
AN US
and transport [43].
Monte Carlo (MC) method is one of the most common approaches for solving SPDE numerically, where a large number of realizations are randomly generated and the statistical
M
properties of model responses can be obtained from sampling. The accuracy of MC method is highly dependent on the sample size one chooses [4]. A large number of samples are usually
ED
required through the MC, especially for quantifying the high order statistical moments or the full
PT
probability density function (PDF). The computational cost may be alleviated with the rise of
parallel computing and high performance computers. Besides, other stochastic approaches such
CE
as the moment equation method or perturbation approach [43, 44] can be used for quantifying
AC
uncertainty. But these methods also suffer from high computational cost when the problem is complicated. An efficient alternative, the polynomial chaos expansion (PCE) has found extensive use in
uncertainty quantification in the last two decades [15, 37, 39]. In the PCE method, the random output is represented by a series of orthogonal polynomial basis functions in terms of the independent input random variables [16]. The orthogonal polynomials are with respect to the
ACCEPTED MANUSCRIPT
distributions of the input random variables. Hermite PCE is specified for Gaussian random variables. In addition, generalized PCE is developed for input random variables with different distributions [39]. If the random inputs are correlated or characterized by random field, parameterization methods such as Karhunen-Loève (K-L) expansion can be used to transform
CR IP T
them to independent random variables [26, 27]. Once the PCE coefficients are obtained, the statistical properties of the random output can be easily evaluated. The PCE can be used for UQ for arbitrary input distributions [25, 31], and has been applied for global sensitivity analysis [9,
AN US
30]. Considerable work has been accomplished in this field. One of the best-known, the stochastic finite element method is pioneered by Ghanem and Spanos [16], which has been applied in many areas [15, 17, 23]. The method is termed as intrusive because it requires manipulation of the
M
governing equations. Discretized equations for PCE coefficients are derived with the Galerkin scheme and a set of coupled deterministic equations have to be solved. Arnst et al. further
ED
improved the method with the reduced chaos expansion [1]. Besides, the Bayesian methods can
PT
also be applied to identify the unknown coefficients [2]. In contrast, non-intrusive methods do not require manipulation of the governing equations.
CE
They allow using the existing deterministic simulators, which can be treated as a black box. Such
AC
methods can be roughly categorized into two types [41]. One is based on interpolation, where the simulation samples are interpreted by the approximation precisely, e.g., the stochastic collocation method (SCM) [3] and the probabilistic collocation method (PCM) [33]. The former uses Lagrange interpolation points as collocation points, while the latter uses the points of numerical evaluation of the Galerkin integral [7, 36]. The other type of non-intrusive method is based on regression, where the simulation samples are approximately matched. A larger size of collocation
ACCEPTED MANUSCRIPT
points than the number of basis functions is required to guarantee the matrix being well-conditioned for calculation [5, 6]. These approaches are found to be accurate and computationally efficient for problems with relatively low random dimensionality [7, 26, 28]. However, the number of collocation points is closely related to the random dimensionality
CR IP T
and the degree of polynomials [38]. Since each collocation point represents a full-scale deterministic simulation, the computational cost increases when the random dimensionality is high. This so-called “curse of dimension” phenomenon is drawing more and more attentions in recent
AN US
years. Some dimension reduction methods have been proposed to overcome the problem. The idea of compressed sensing has been introduced for UQ, where the sparsity of the random response functions is recovered by L1-norm [11, 20, 41]. In addition, some researchers intended to
M
construct sparse PCE response surfaces by regression techniques. Blatman and Sudret utilized the stepwise regression [5] and least angle regression (LAR) [6] for stochastic finite element problems.
ED
Elsheikh et al. applied LAR for Bayesian inference of permeability field in the subsurface [13]. In
PT
fact, both the stepwise regression and LAR are common algorithms in this field of feature selection [14, 19]. Feature selection was first proposed in the area of statistical learning for
CE
removing redundant and irrelevant features. A mature system for feature selection has been
AC
established, where extensive algorithms and successful applications have been published in the last two decades [12, 14, 21, 34]. In this study, the sparse PCE constructed by feature selection is introduced for UQ for flow in
random porous media. The input random field of hydraulic conductivity is parameterized by the K-L expansion and the output random field of hydraulic head is represented by PCE. Our objective is to accurately and efficiently quantifying uncertainty for the hydraulic head with the
ACCEPTED MANUSCRIPT
sparse PCE. A non-intrusive stochastic approach is adopted and the PCE coefficients are derived by regression. In our problem, there may exist high random dimensionality associated with the K-L expansion [44], thus the total number of basis functions ( P ) will be much larger than the number of simulations ( N ) one can afford, i.e., 𝑁 ≪ 𝑃. By constructing sparse PCE with the
CR IP T
help of feature selection, the problem will be converted to the ordinary N P type. In this way, higher accuracy and efficiency can be reached with a limited number of simulations. Specifically, the least absolute shrinkage and selection operator (LASSO) modified least angle regression
AN US
algorithm (termed as LASSO-LAR) [34] is utilized to select major stochastic features, which is not that greedy as stepwise regression and is highly efficient for high dimensional problems. In
our numerical examples, the proposed method will be compared with MC for both accuracy and
M
efficiency.
The remainder of this paper is organized as follows. In Section 2, the governing equations of
ED
flow and stochastic representations of random fields are mainly discussed. In Section 3, we
PT
introduce the feature selection method, specifically the LASSO-LAR algorithm, and its validity assessments. Besides, its implementation with inherited samples is also outlined. Following an
CE
illustrative analytical example, some numerical examples for both steady state and transient flow
AC
are given in Section 4. The conclusions are given in Section 5.
2 Stochastic Representations 2.1 Governing equations We consider transient flow in saturated porous media satisfying the following continuity equation and Darcy’s law:
ACCEPTED MANUSCRIPT
Ss
h(x, t ) q(x, t ) g (x, t ), t
(1)
q(x, t ) Ks (x)h(x, t ),
(2)
subject to initial and boundary conditions:
x D,
(3)
h(x, t ) H (x, t ),
x D,
(4)
q(x, t ) n( x) Q(x, t ),
x N ,
CR IP T
h(x,0) H 0 (x),
(5)
where h(x, t ) is hydraulic head, S s is the specific storage, q(x, t ) is the specific discharge, g (x, t ) is
domain D
d
AN US
the source (or sink) term, and K s (x) is the hydraulic conductivity. H 0 (x) is the initial head in
, H (x, t ) is the prescribed head on Dirichlet boundary segments D , Q(x, t ) is
the prescribed flux across Neumann boundary segments N , and n( x) (n1 ,..., nd )T is an outward
N .
M
unit vector normal to the boundary D
In this study, K s (x) is assumed as a log-normal random field, thus Eqs. (1)-(5) become
ED
stochastic partial differential equations, whose solutions are no longer deterministic values but
PT
probability distributions or related statistical moments. We aim to efficiently and accurately estimate the statistical properties of the hydraulic head in terms of the statistical moments of log
AC
CE
transformed K s (x) .
2.2 Representation of hydraulic conductivity random field Let Y (x, ) ln[ Ks (x, )] be a random field, where x D, ( is the event space of a
complete probability space (, , ) ,
is a σ-algebra of and
is a probability measure). One
may write Y (x, ) Y (x) Y (x, ) , where Y (x) is the mean and Y (x, ) is the fluctuation. The spatial structure of the random field is described by the covariance CY (x, x) Y (x, )Y (x, ) .
ACCEPTED MANUSCRIPT
Since the covariance is bounded, symmetric and positive-definite, it may be decomposed as [16]:
CY (x, x) i fi (x) fi (x),
(6)
i 1
where i and fi (x) are eigenvalues and eigenfunctions respectively. They can be solved from the
D
CY (x, x) f (x)dx f (x).
CR IP T
following second type Fredholm equation: (7)
Then the random field Y (x, ) is expressed by the Karhunen-Loève (K-L) expansion as:
Y (x, ) Y (x) i fi (x)i ( ).
AN US
i 1
(8)
If Y (x, ) is a Gaussian random field, i ( ) are orthogonal Gaussian random variables with zero mean and unit variance, i.e., i ( ) 0 , i ( ) j ( ) ij .
From Eq. 6 or 8, we have Y2 i 1 i fi 2 (x) , and it yields D Y2 i 1 i after integrating,
M
where D is a measure of the physical domain size (length, area or volume for 1D, 2D or 3D
ED
domain, respectively). In practical applications, Eq. 8 has to be truncated into finite terms for
PT
calculation. The number of truncated terms is determined by the convergence rate of i summation. Typically, n terms are retained if
n i 1
i is relatively close to the total variance
CE
D Y2 . As a result, the random field is governed by a n -dimensional random vector
AC
(1 ( ),..., n ( ))T . Note that the random dimensionality of the problem is equivalent to the
dimensionality of the random vector. A slow convergence rate is often detected in a multi-dimensional physical domain and as a result, the analysis of uncertainty has to be implemented in a high-dimensional random space which usually leads to large computational cost [44].
ACCEPTED MANUSCRIPT
2.3 Representation of hydraulic head random field Since the input field K s (x, ) is random and characterized by ( ) (1 ( ),..., n ( ))T , the output hydraulic head field h(x, t , ) M (x, t , ( )) is also random, where M stands for the physical model governing the flow. The analysis we made hereafter are all for fixed time and
CR IP T
space, thus the physical measure (x, t ) will be omitted for simplicity. The random structure of output field cannot be represented by the K-L expansion as the covariance is yet to be found. As an alternative, the polynomial chaos expansion (PCE) technique is applied here to construct a
AN US
response surface to approximate the output. The output hydraulic head field is decomposed into a summation of a series of polynomials in terms of the independent random variables [40]: h( )
a
(( )),
(9)
n
where a are unknown PCE coefficients and are multivariate polynomial basis functions,
M
which are orthonormal with respect to the joint probability density function (PDF) p () :
ED
(), () () () p ()d .
(10)
PT
The full PDF is required to obtain basis functions. In some practical applications, only low order statistical moments or discrete PDFs are available. With the limited information, one has to make
CE
some reasonable assumptions about the full distribution of the random variables.
AC
To construct the multivariate basis functions , one may start from the univariate basis
functions with respect to the marginal PDF pi (i ) . Consider a family of orthonormal polynomials { (ji ) , j } with respect to pi (i ) in one-dimension:
(ji ) (i ), k(i ) (i ) jk , where j, k
(11)
stands for the degree of polynomials. Basis functions in one-dimensional space
are much more convenient to derive and specific optimal polynomial families have been found for
ACCEPTED MANUSCRIPT
some common distributions, e.g., Hermite polynomials for Gaussian random variables, and Legendre polynomials for uniform random variables, etc. As for arbitrary input distributions, the PCE can be constructed numerically according to specific distributions [25, 31, 39]. In addition, we have p () i 1 pi (i ) since the elements in are independent to each n
CR IP T
other. According to Eq. 10, the multivariate basis functions can be constructed by tensor products of the univariate basis functions: n
()=(i ) (i ), n
is a multi-index with norm i 1i , whose elements 1 , n
degree of polynomial in each dimension.
AN US
where
(12)
i
i 1
, n indicate the
Typically, basis functions are selected from truncated polynomial spaces, so that a finite
M
number of terms will be retained. In this paper, a total degree polynomial space is considered,
ED
where the summation of n -dimension polynomial degrees is no more than an integer d . The truncated PCE up to degree d is given as:
PT
hd ( ) a (( )),
where is a multi-index set of defined by {
CE
(13)
n
: d} . The total number of basis
AC
function P is determined by the random dimensionality n and the polynomial degree d :
n d (n d )! P . n !d ! d
(14)
Once the PCE coefficients are evaluated, in consideration of the orthogonality of the basis
function, statistical moments of the output field such as mean and variance can be derived directly from Eq. 13:
h aα=0 , h2
{ \ 0}
a2 .
(15)
ACCEPTED MANUSCRIPT
Or the Monte Carlo simulation can be performed on the PCE (Eq. 13) as a post processing to get the statistical moments and PDF, which is computationally efficient for the polynomial surrogate model.
CR IP T
2.4 Evaluation of PCE coefficients To evaluate the PCE coefficients, one first samples a set of collocation points
{(1 ),
, ( N )} , where N is the number of samples. The sampling method will be discussed in
AN US
section 3.4. On each collocation point (i ) (1 (i ),..., n (i ))T , a deterministic flow problem is solved and as a result, N realizations of hydraulic head {h(1 ),
, h( N )} are generated. On the
other hand, the output field is approximated by the PCE as shown in Eq. 13, thus an equation set
P
M
of PCE coefficients is obtained:
a (( )) h( ), j
j
i
i
i 1,
(16)
, N.
ED
j 1
Note that every subscript j in Eq. 16 corresponds to a specific multi-index in Eq. 13.
a h,
CE
PT
We rewrite the equation set in the form of matrix:
where a (a1 ,
, aP )T is the vector of coefficients, h = (h(1 ),
(17) , h( N ))T is the vector of
AC
model realizations, and N P is a Vandermonde-like matrix of basis functions whose elements are ij j ((i )), i 1,
, N , j 1, , P .
We consider solving the equation set with regression method, where the simulation samples are approximately matched on some unstructured points. The PCE coefficients estimated by ordinary least-square (OLS) regression are given below, where the residual summation of squares is minimized:
ACCEPTED MANUSCRIPT aOLS arg min (h a)T (h a) (T ) 1 T h.
(18)
A well-conditioned matrix T is required, otherwise numerical oscillations may be detected in calculation and the estimations may be inconvincible. To ensure numerical stability, the number of realizations N should be greater than the number of basis functions P , where
CR IP T
N kP, k [2,3] is usually recommended [6]. As indicated in Eq. 14, the number of basis
function is extremely large when the random dimensionality or the polynomial degree is high.
While in practical applications, only a limited number of realizations are affordable, i.e., is a
AN US
matrix of 𝑁 ≪ 𝑃 size, which will lead to a strongly ill-conditioned equation set and make great trouble for coefficient evaluations. Therefore we aim to reduce the size of basis function and construct a sparse PCE response surface for the UQ problem. The feature selection method is
ED
M
utilized to tackle this problem and it is detailed in next section.
3 Feature Selection for basis reduction
PT
3.1 Introduction to Feature Selection
CE
Feature selection, also known as variable selection or attribute selection, is a technique to select subsets of relevant features for model constructions. It was originally proposed in statistical
AC
learning area and has been applied to many fields in recent years, e.g., bioinformatics, text categorization, signal processing and etc. [19, 32, 42]. In these applications, selection methods are adopted to improve prediction performance and provide a better understanding for datasets with hundreds or thousands redundant variables and comparably few training examples.
CR IP T
ACCEPTED MANUSCRIPT
Fig. 1. Feature selection process with validation [10].
AN US
A general process of feature selection is shown in Fig. 1. A candidate feature subset is
generated at first. Then the model constructed by the subset is examined by an evaluation function to assess the goodness for prediction. If the requirement is not met, new subsets are generated and
M
the previous steps are repeated. Once a stopping criterion is reached, the subset is qualified and its validity for general cases will be appraised at last.
ED
The generation algorithms may be categorized into three types, exhausted, heuristic and
PT
random search method, and the evaluation algorithms are categorized into filter, wrapper and embedded types. By combining different generation and evaluation methods, considerable
CE
algorithms have been proposed in the last two decades, e.g., Fisher Score, sequential forward
AC
selection, regularization model and etc. [10, 19]. In previous works, the stepwise regression and least angle regression (LAR) have been applied for UQ problems [5, 6, 13]. In the following text, we will introduce another popular algorithm in statistical learning, i.e., LASSO-LAR, whose implementation for PCE basis reduction is quite promising.
3.2 Least Absolute Shrinkage and Selection Operator (LASSO)
ACCEPTED MANUSCRIPT
We still focus on the regression problem of PCE coefficients in Eqs.16 and 17. From the aspect of statistics, each basis function in the PCE may be viewed as a specific feature of the output random field and they are weighted by the coefficients a (a1 , the basis matrix N P (Φ 1 ,
, aP )T . Correspondingly,
,Φ P ) represents a collection of N samples with P features,
CR IP T
which forms the training dataset. Each column vector represents a specific feature and each row vector represents a realization. Realizations of the output field h (h1 , targets and the model predictions are denoted by hˆ a (hˆ1 ,
, hN )T are the model
, hˆN )T . The relationships
AN US
between the features and targets are established on training dataset, indicated by Eqs. 16 and 17. Such relationships may be extended to general cases for prediction with arbitrary samples.
OLS regression aims to find the weights which minimize the loss function, i.e., the L2-norm of residual between model targets and predictions:
,
M
2
N
P
i 1
j 1
h hˆ (hi a j ij ) 2 . 2
(19)
ED
aOLS arg min h hˆ
Every feature is given a non-zero weight in OLS regression, implying that the output is interpreted
PT
by all features. However, there always exist some redundant and irrelevant features, whose
CE
weights should be strictly set to zero. The LASSO, proposed by Tibshirani [34], discards these
AC
features by setting a constraint to the coefficients:
a LASSO arg min h hˆ
2
,
s.t. a 1 t,
(20)
where a 1 j 1 a j t is the L1-norm penalty term. The coefficients are restricted to both the P
loss function and penalty function. An equivalent Lagrangian form of Eq. 20 is:
a LASSO arg min h hˆ a 1 , 2
(21)
where 0, is the Lagrange factor and a one-one mapping exists between t and . By making t sufficiently small or sufficiently large, some of the coefficients will be exactly zero,
ACCEPTED MANUSCRIPT
thus a kind of continuous feature selection is performed. When the penalty term becomes L2-norm (resp. L1-L2-combined norm), the LASSO may turn to ridge (resp. elastic-net) regression method [14]. Theoretically the constraints of non-zero coefficients should be defined by a L0-norm penalty
CR IP T
term, where a 0 #k : ak 0 t is considered. The L0-norm penalty is a “hard-thresholding” requiring the size of non-zero coefficients to be strictly limited to t . Although it is straightforward, the optimal solution is difficult to find, which is obviously a NP problem. As an
AN US
alternative, LASSO may do the same selection job with less calculation difficulties, because of the fact that L1-norm is the optimal convex approximation to L0-norm [34]. Therefore the LASSO algorithm can be used for feature selection, though it is intrinsically a shrinkage method.
M
Optimization algorithms can be applied to solve LASSO coefficients in Eq. 21, e.g., (block) coordinate descent, proximal method and etc. [14]. However, they may be computationally
ED
expensive when a large number of features exist. An efficient alternative is to perform LASSO
PT
with the help of least angle regression (LAR) [12]. LAR is a heuristic searching method for feature selection. Although different form LASSO in theory, it can be used to efficiently compute the
CE
entire solution path of LASSO. The LASSO-LAR algorithm is quite efficient, especially for high
AC
dimensional problems when 𝑁 ≪ 𝑃. The procedures of LASSO-LAR are given below [14].
LASSO-LAR Algorithm: 1.
In preparation, each column vector of features in the training dataset is normalized and the
target values are centered, i.e.,
1 N 1 N 2 ij 0, ij 1, j 1, N i 1 N i 1
, P and
1 N hi 0 . N i 1
ACCEPTED MANUSCRIPT
Initialize with the residual r h, and set a1 ,
, aP 0 .
2.
Find the feature vector Φ j most correlated with r .
3.
Move a j from 0 towards its least-square coefficient Φ j , r , until some other competitor Φ k has as much correlation with the current residual as does Φ j . Move {a j , ak } in the direction defined by their joint least-square coefficient of the current
CR IP T
4.
residual on (Φ j ,Φ k ) , until some other competitor Φ l has as much correlation with the current residual.
If a non-zero coefficient hits zero, drop its feature from the active set of features and
AN US
5.
recompute the current joint least-square direction.
Continue in this way until min( N 1, P) features have been entered.
be the active feature set and a LAR be the k
M
6.
At the beginning of the k -th step, let
k
ED
a LAR ( k ) w( k ) , coefficient vector. In each loop, the coefficients are updated with the form a LAR k k 1
PT
where w ( k ) is the unit equiangular vector of current active feature set and ( k ) is the updating step along direction w ( k ) . The coefficients proceed in a direction which makes the smallest and
CE
equal angle with each of the active feature vector, hence the algorithm is termed as least angle.
AC
The LASSO-LAR algorithm is quite similar to LAR except for the major difference in step 5. In LAR, the selected features are not dropped even if they are improperly recognized in previous steps. In LASSO-LAR, the features are assessed according to the behavior of coefficients. Detailed proofs and algebra derivation of and w are shown in [12] and they will not be discussed here. For the convenience of assessment (see Section 3.3), we apply a hybrid LAR algorithm to
ACCEPTED MANUSCRIPT
estimate the PCE coefficients. This method is an adaption of ordinary LAR, where the selection task is accomplished by LAR and the estimation by OLS. Once the active feature set
k
is
determined, the hybrid LAR coefficients are derived in an OLS way:
ahybk arg min h k a
2
(22)
CR IP T
(Tk k )1 Tk h.
The hybrid LAR will give similar results as ordinary LAR when some major features are involved. The LASSO-LAR algorithm is highly efficient because it does not cope with optimization problems directly, but proceeds on an optimal solution path step by step. In each loop of the
AN US
algorithm, a new model formed by current active feature subset is generated. As the LASSO-LAR gives few assessments to these candidate feature subsets, assessments will be made with the aid of
ED
3.3 Assessments of feature subset
M
certain criteria, and finally the best subset will be applied for model construction.
The performance of a feature subset is related to its prediction capability on independent test
PT
datasets. In general cases, the goodness of fit may be quantified by some evaluation functions,
CE
among which the most commonly used include determination coefficient R 2 , Mallows’ C p , Akaike information criterion (AIC) and Bayesian information criterion (BIC) [21]. In this paper,
AC
we employ a cross validation (CV) method for subset assessment, with which the quantitative error estimation can be obtained. After k steps of LAR, the current model is constructed by the feature subset corresponding model prediction is hˆ
k ( ,Φ i , ), i
k
k
k a LAR (hˆ k
k
,1
,
, hˆ
k
,N
k
, and the
)T , where
is the current training data. The empirical error, also referred to as the
training error, is defined as:
ACCEPTED MANUSCRIPT
Erremp
1 N (hi hˆ N i 1
k
,i
)2 .
(23)
Empirical error is a straightforward description of loss between predictions and targets over training samples. Nevertheless it is not a proper criterion as it consistently decreases with
CR IP T
increased model complexity, typically dropping to zero if the complexity is increased to some extent. Although a model with small empirical error implies a perfect prediction on training
dataset, it may perform poorly on other test datasets which means the model is overfitted and does not fit for general cases.
AN US
The CV method sets aside a testing dataset to avoid being misled by overfitting models. In the K -fold CV, the training data k (Φ 1 ,
,Φ N )T is randomly split into K roughly
equal-sized groups and each group is denoted by ki (Φ {i } )T , :{1,
, N}
{1 ,
,K } .
M
When the i -th group ki is defined as testing dataset, the model is trained on the other K 1
ED
groups data ki (Φ { \i } )T . The trained coefficient is denoted by a ki . Predictions are made on the testing dataset, i.e., hˆ ki ki aki , and they are compared to the group targets hi . Loop
CE
PT
through every divided group, then the K-fold CV error is defined as follows:
ErrCV
1 K i ˆ i 2 (h h k ) . N i 1
(24)
AC
A typical choice of K is 5 or 10. With K increasing, the K -fold CV error will be of lower
bias but higher variance, furthermore considerable computational efforts are required. When
K N , i.e., i i , the CV error is termed as leave-one-out error and it is a reasonable estimate
of the expected error. For linear fitting problems, if the prediction is a linear transformation of target, i.e., hˆ
k
Sh , an efficient approximation of leave-one-out error can be given, which is the so-called
ACCEPTED MANUSCRIPT
generalized cross validation (GCV) error [18]. As we have mentioned before, in hybrid LAR algorithm, the model prediction is: hˆ
k
k ahybk k (Tk k )1 Tk h = Sh.
(25)
It can be proved that: 2
(26)
CR IP T
1 N i ˆ i 2 1 N hi hˆ k ,i , (h h k ) N N i 1 i 1 1 Sii
where Sii is the i -th diagonal element of S . Hence the GCV approximation is defined by: 2
(27)
AN US
ErrGCV
1 N hi hˆ k ,i , N i 1 1 trace(S) / N
where trace(S) is the sum of Sii . The GCV doesn’t solve extra fitting problems thus it is highly efficient.
M
The errors discussed above are simply associated with the statistical resolution. However in practical applications, one may also consider the errors associated with physical resolution
ED
involved in numerically solving the governing equations of the flow problems. The interaction of
PT
these errors determines the allocation of computational resources [24, 29]. In this study, we only
CE
focus on the statistical errors associated with the PCE.
AC
3.4 Implementation with sequential samples In the LASSO-LAR, the performance of prediction is strongly dependent on the quality of the
training dataset. To prevent the generation of poorly distributed samples, appropriate sampling method for training dataset is indispensable. In the SCM and PCM, collocation points are generated by constructing tensor product with selected interpolation points in each dimension, or directly using Gaussian integral points for specific degree of polynomials [7, 26]. In the sparse
ACCEPTED MANUSCRIPT
PCE algorithm, excessive candidate basis and limited samples are given at first, which makes it impracticable to apply those methods whose sampling principle is closely correlated to the basis function. To address the problem, the Latin Hypercube sampling (LHS) method is applied in our problem, as it is able to generate well distributed samples with arbitrary size [8].
CR IP T
In LHS method, the range of each dimension is divided into several equiprobable intervals at first. Observations are randomly selected in each interval. Then by randomly combining
observations in different dimension, a LHS sample set whose size is equivalent to the number of
AN US
equiprobable intervals is obtained.
However the required number of sample is difficult to determine a priori, while a conservative design is usually generated at first, whose computational cost is relatively small. If
M
the current samples perform poor for prediction, some new samples should be added to the original design sequentially, until a good result is obtained, e.g., the CV error is sufficiently small.
ED
In other words, we hope the construction of response surface to be self-adaptive. On the basis of
AC
CE
this paper.
PT
LHS method, a sequential sampling method named inherited LHS method [35] is implemented in
(a) Original LHS design
(b) New design space
(c) Unsampled space
ACCEPTED MANUSCRIPT
Fig. 2. Inherited LHS in two-dimensional domain, three new samples (red dots) are generated on the basis of three original LHS samples (black dots).
To illustrate the inherited LHS method, let us look at the two-dimensional uniformly
CR IP T
distributed example in Fig. 2. Three samples are generated by the LHS method as original design (Fig. 2a). To generate three new samples without losing the uniformity of the LHS, the space is
redivided into six equiprobable intervals in each dimension. In the new design space, the intervals
AN US
occupied by the original LHS samples are shaded in Fig. 2b. The unshaded areas are extracted to
form the unsampled space as shown in Fig. 2c. If two or more samples fall into the same interval, the unsampled space has to be artificially expanded as indicated in [35]. Then the new samples are
M
generated to fill the unsampled space in a LHS way, which are denoted by the red dots in Fig. 2c. After that, samples in the unsampled space are mapped back to the new design space. Since the
ED
original design is still part of the new design, extra simulations are performed only on the new
PT
samples, therefore the method is inherited and highly efficient. Let us make a brief summary for the implementation of feature selection in basis reduction.
CE
At first, a conservative design is generated by the LHS method as the original sample set. Model
AC
simulation is performed for each sample. Then polynomial basis functions are derived by PCE to form the original feature set. The degree of polynomials may be set at a conservatively low level at first. The LASSO-LAR algorithm is adopted to select features and each feature subset is assessed by the approximated GCV criterion. If the stopping criterion cannot be reached, one may first increase the polynomial degree and make use of higher order basis functions for estimation. With the maximum degree one can accept, if the GCV error is still larger than expectation, the current
ACCEPTED MANUSCRIPT
sample set will be enriched with the inherited LHS method and the previous steps will be repeated. Once a specific feature subset is qualified, the response surface can be constructed. Finally, the post processing and validations for general cases are executed for further analysis. The flowchart
AC
CE
PT
ED
M
AN US
CR IP T
is shown in Fig. 3.
Fig. 3 Flowchart of basis reduction for sparse PCE with feature selection method.
4 Case studies In this section, one analytical example and two groundwater flow examples (steady state and
ACCEPTED MANUSCRIPT
transient) are presented to illustrate the sparse PCE method along with feature selection. In the analytical example, detailed validation of feature selection is mainly discussed. As for the flow examples, the sparse PCE is used for UQ of groundwater flow, and the results are compared with
CR IP T
MC simulations.
4.1 Illustrative analytical example 4.1.1 Problem statement
AN US
Let us consider a random variable h( )=M () governed by three uncertain variables
1 , 2 , 3 , which are assumed to be standard Gaussian variables and independent to each other. The model is analytical:
(28)
M
M () e1 223 .
In fact, the model is a black box to people who are analyzing its uncertainty characteristics. For
PT
than three, denoted by:
ED
illustration, the response surface is constructed by simple polynomial terms of degree no more
P
Mˆ () ai1k1 2k2 3k3 , k1 k2 k3 3, k1 , k2 , k3 ,
(29)
CE
i 1
where 20 polynomials are adopted as the original feature set. In this case, we utilize a constant
AC
LHS design with 40 samples to evaluate the PCE coefficients. After running the model 40 times as target evaluation, and assembling 20 candidate basis
functions as training dataset, the LASSO-LAR algorithm is applied to select useful features. Note that the training data are normalized and the targets are centered in the preprocessing step, there is no intercept between them, i.e., the constant term in the response surface is absorbed and needless to be considered. In addition, the L1-norm of coefficients is monotonically increased along the
ACCEPTED MANUSCRIPT
solution path due to the nature of LASSO-LAR, and finally an OLS solution aOLS is obtained when all features are activated. Hence we define a shrinkage factor s
a LAR k a
1
OLS
, which is the ratio
1
of L1-norm between current LAR and full OLS coefficients, to quantify the progress of the k -th
CR IP T
step on solution path.
4.1.2 Application of feature selection
Theoretically, the analytical model can be rewritten by Taylor series expansion with Peano
AN US
remainder:
M () 1 1
12 2
13 6
223 o(13 ).
(30)
Therefore among all features, we hope terms like 1 , 12 , 13 , 223 will be retained eventually and
M
equipped with similar coefficients as those in Eq. 30. The first ten steps of LASSO-LAR are
ED
demonstrated in Table 1.
PT
Table 1. First ten steps along LASSO-LAR solution path
AC
CE
Step
Added
Dropped
Size of
Shrinkage
active
factor
set
1
223
1
0.104
2
1 22
2
0.165
3
13
3
0.272
4
1
4
0.353
5
12
5
0.672
ACCEPTED MANUSCRIPT 1 22
6
4
0.732
7
23
5
0.736
8
2
6
0.743
5
0.748
6
0.750
23
9
23
CR IP T
10
In every step, a specific feature is added to or dropped from the active feature set. Right in the first six steps, features we hope to be retained are all selected, though an uncorrelated term
AN US
1 22 is included in the second step and finally discarded. That is to say the major stochastic
features are captured by the LASSO-LAR algorithm both accurately and immediately in this case. Moreover, the shrinkage factor increases slowly after six steps, indicating that the features added
M
afterwards are given very small weights and they may make limited contributions to the response
AC
CE
PT
ED
surface. Therefore we may guess the best feature subset is probably selected around the sixth step.
Fig. 4 Leave-one-out error and empirical error along the solution path.
ACCEPTED MANUSCRIPT
Assessments are made for each subset and both leave-one-out error and empirical error are shown in Fig. 4. The leave-one-out error appears to fluctuate along the solution path according to the performance of subset. There is a slight increase at step four with s 0.353 where 1 is
CR IP T
added, which is supposed to lower the leave-one-out error but failed. This is because the uncorrelated term 1 22 has interfered the impact of 1 and results in a slightly poor fitting. The minimum error is reached at step nine with s 0.748 , where ErrCV 0.0753 is marked by the
AN US
vertical line in Fig. 4. After that the prediction error starts to go up, suggesting that the features added afterwards may be redundant and useless for fitting. Therefore the best feature subset
consists of five activated terms at step nine, i.e., 1 , 12 , 13 , 223 and 2 , which quite agrees with our expectation. However, the empirical error appears to decrease all the way down. Even if a
M
bunch of redundant features are included after s 0.748 , it seems that their participation is
ED
continuously lowering the error and improves the prediction accuracy, but they are actually
PT
overfitted on the training data. If judged by empirical error, the original excessive feature set will
AC
CE
be identified as optimum, which is obviously against common sense.
(b)
AC
CE
PT
ED
M
AN US
(a)
CR IP T
ACCEPTED MANUSCRIPT
Fig. 5 (a) LAR coefficients (b) hybrid LAR coefficients of each feature along the solution path.
Let us check the coefficients of each feature in Fig. 5. Every single line in Fig. 5a represents the change of LAR coefficient for a specific feature along the solution path, and a vertical line is drawn at s 0.748 to illustrate where the best subset lies. Once a feature is activated, its
ACCEPTED MANUSCRIPT
coefficient will move from zero towards its OLS solution at s 1 , together with other activated features. But for 1 22 , its coefficient climbs up but declines to zero at last. In other words, it is moving oppositely after step three and gradually erasing its impact on fitting till step five. Thus a slightly increase of leave-one-out error is detected at step four, right in the transition period. After
Fig. 5a, however, they are presented with limited weights.
CR IP T
the best subset is selected, features are activated intensively as shown in the lower right corner in
Fig. 5b describes the hybrid LAR coefficients for each feature. Unlike ordinary LAR
AN US
coefficients moving gradually and jointly, the hybrid LAR coefficients intend to reach maximum
accuracy in every step. As a result, once a feature is activated, its hybrid coefficient will stretch as much as possible to get best fitting. Therefore there is a sudden rise for every line in Fig. 5b. When
M
several features are added and they are mutually constrained, the hybrid coefficients will gradually converge to the ordinary LAR coefficients. The optimal response surfaces constructed by ordinary
ED
and hybrid LAR coefficients are presented as follows: (31)
Mˆ hyb () 1.0919 0.96131 0.544812 0.226313 0.9648223 0.05272 .
(32)
PT
Mˆ LAR () 1.1038 0.94711 0.554512 0.225013 0.9667223 0.03292 ,
CE
These two estimations are both quite similar to the expansion form of model in Eq. 30. Though
AC
2 is included, it is given a very small weight and its existence may interpret the effect of the omitted remainder. In general, the LASSO-LAR has done an excellent job for feature selection in this simple analytical case.
In the following, we will inspect its performance for the groundwater
flow problems.
4.2 Examples of steady state groundwater flow
ACCEPTED MANUSCRIPT
4.2.1 Problem statement In this section, we consider the steady state groundwater flow problem in a two-dimensional domain of saturated heterogeneous medium, which is a square of size Lx Ly 10.0 [L] (where [L] denotes any consistent length unit), uniformly discretized into 40 40 square elements. The
CR IP T
non-flow conditions are prescribed at two lateral boundaries. The hydraulic head is prescribed at the left and right boundaries as 7.0 [L] and 5.0 [L] , respectively. No source term is considered. The log hydraulic conductivity field is considered as a Gaussian random field, whose mean is
AN US
Y ln K s 0.0 and variance is Y2 1.0 . Assume that the covariance function between
x( x1 , y1 ) and x( x2 , y2 ) is CY (x, x) Y2 exp( x1 x2 / x y1 y2 / y ) , where
x y 4.0 is the correlation length of the random field. The log hydraulic conductivity field is
M
represented by the K-L expansion. Since the covariance function is separable, the eigenvalues and
direction, we have [44]:
2 2 2 ( xx ) Y2 , x ( j ) 1
PT
( x) j
ED
eigenfunctions of CY can be derived analytically in each dimension. For example, in x
f
( x) j
( x)
x (j x ) cos( (j x ) x) sin( (j x ) x) [( x (j x ) )2 1]Lx / 2 x
,
(33)
CE
where (j x ) are positive roots of the characteristic equation: (x2 2 1)sin( Lx ) 2x cos( Lx ).
(34)
AC
k( y ) and f k( y ) ( y) in the y direction are derived in the same way. The two-dimensional
eigenvalues (and eigenfunctions) are obtained by combining eigenvalues (and eigenfunctions) in each dimension:
i
1
2 Y
(j x ) k( y ) , fi (x) fi ( x, y) f j( x ) ( x) f k( y ) ( y).
(35)
CR IP T
ACCEPTED MANUSCRIPT
AN US
Fig. 6 Series of eigenvalues and their finite sums.
We rearrange the eigenvalues in a nonincreasing sequence and calculate the summation. Series of eigenvalues and their finite sums as a function of the number of terms are shown in Fig.
n
/ D Y2 80%, 90% or 95% , where 20, 60 or 164 terms are retained respectively.
i 1 i
ED
M
6. The infinite series in the K-L expansion is truncated at a certain level of energy, say
Theoretically a higher accuracy will be reached if more terms are preserved. Although the
PT
conductivity field with more energy is equipped with more details, these smaller scale fluctuations
CE
will make little difference to simulate the hydraulic head field. Thus we preserve 80% energy in K-L expansion for simplicity and the random field is 20-dimensional. According to the result in
AC
section 4.2.2, a random dimensionality of 20 is sufficiently enough to characterize the random structure of the hydraulic head field in this problem. Since there is no analytical reference, the direct sampling Monte Carlo (MC) method is applied for comparison and its solution is regarded as benchmark. For the MC, we preserve 164 terms in K-L expansion and run the simulator 10,000 times, which ensures statistical convergence and accurate results.
ACCEPTED MANUSCRIPT
The PCE response surface is constructed on the basis of the 20-dimensional random vector ( ) (1 ( ),..., 20 ( ))T . Hermite polynomials are adopted as basis functions since
1 ( ),..., 20 ( ) are standard Gaussian random variables. In fact, we have no idea which basis functions are useful and how many collocation points are sufficient beforehand. Therefore in case
CR IP T
of losing any important information, the original feature set consists of polynomials of degree 1
20 3 and it will be gradually expanded to degree 3, where P 1,771 basis functions will be 3 included at most. In consideration of computational cost, a conservative design of 50 samples is
AN US
generated at first. Feature selection and assessment will be implemented later. The original sample set will be enriched with 20 new samples constantly, until a satisfying result is achieved. Besides, the setup for stopping criterion is quite tricky as the size of sample set and feature set always
M
change. To find a proper criterion, we may conduct several steps of selection at first, where the
ED
trend of error is observed for reference. In this case, the stopping criterion is set as the simulated
PT
head variance over the number of samples.
CE
4.2.2 Analysis of statistical properties The mean and variance of the hydraulic head field are obtained by post-processing. There is
AC
little difference between the estimated and the reference mean field owing to the particular boundary conditions in our examples, hence we mainly focus on the variance for comparison. Figs. 7a, 7b and 7c exhibit the head variance on three cross sections y 2.5 [L] , 5.0 [L] and 7.5 [L] , respectively. When the sample size is 50, the estimated variance is strongly nonsmooth
and far away from the benchmark. It means that the original conservative design is too limited to capture the characteristics of the random field. To improve the estimation, 20 samples will be
ACCEPTED MANUSCRIPT
added to current sample set, where actually 27 samples are generated because the unsampled space has to be artificially expanded to keep the new samples as LHS-type. The results with 77 samples are significantly improved as they are much more smooth and closer to the benchmark. If we add inherited LHS samples continually, estimations will gradually approach the benchmark. With just
(b)
AC
CE
PT
ED
(a)
M
AN US
CR IP T
137 samples, the estimation is fairly close to the MC reference (with 10,000 samples).
(c)
(d)
Fig. 7 Estimated head variance h2 along the cross section (a) y 2.5 [L] , (b) y 5.0 [L] y 7.5 [L] for Y2 1.0 , and (d) corresponding estimated errors compared with MC.
(c)
ACCEPTED MANUSCRIPT
The convergence rate for sparse PCE is much faster than that for MC method. In order to give a quantitative illustration, here we define an estimation error for the head variance on the cross section y :
1 Lx
(
1 Nx
y
2 h
( x, y ) ˆ h2 ( x, y )) 2 dx (36)
Nx
( i 1
2 h
( xi , y) ˆ ( xi , y)) , 2 h
2
CR IP T
y
where N x is the total number of physical nodes on the cross section, h2 and ˆ h2 are the reference and approximated variance respectively. This estimation error indicates the total
AN US
difference on the cross section and will gradually converge to zero with the sample size increasing. Comparisons for the error convergence between the sparse PCE and MC on the three cross sections are shown in Fig. 8d. Note that the MC results may vary a lot especially when a small
M
sample is utilized, thus error estimations are obtained by running 1,000 times bootstrap. When the
ED
sample size is small, the sparse PCE performs as poor as the same size of MC simulation. With the sample size increasing, results with sparse PCE method converge at a rapid speed. An accuracy of
PT
10-5~10-6 is reached with one or two thousand times of MC simulation. However, with the sparse
AC
CE
PCE method, one or two hundred times of simulation are fairly enough.
CR IP T
ACCEPTED MANUSCRIPT
AN US
Fig. 8 Hydraulic head PDF at (5.0 [L],5.0 [L]) , estimated by response surface constructed by different size of samples.
M
Once the hydraulic head is approximated in a polynomial form, the probability density
ED
function (PDF) can be obtained with MC sampling. By running 100,000 times MC simulation on the response surface, which is just calculation of polynomials and highly efficient, the PDF of the
PT
hydraulic head are obtained. The estimated PDF at a specific location, say (5.0 [L],5.0 [L]) , is
CE
shown in Fig. 8. With more than 77 samples, the estimations are quite consistent with the reference PDF, while the result with 50 samples is of a large bias. One may wonder why the
AC
statistic moments and PDF vary so much with different size of samples. Therefore let us check out how the response surface is constructed and how much weight is given to each basis function. The response surface of sparse PCE at location (5.0 [L],5.0 [L]) is studied here. The coefficients for all the basis functions are shown in Fig. 10, where blue bars represent coefficients for the polynomials of degree 1 and red bars for degree 2. Though polynomials up to degree 3 are provided as candidates, only the first two degrees are activated. More specifically, since
ACCEPTED MANUSCRIPT
polynomials of degree 1 are exactly the elements of the governing vector (1 ( ),..., 20 ( )) and the conductivity field is intrinsically a linear combination of them, they may have large impact on the response surface, especially for the first several elements which are dominated uncertainty factors. Thus most weights are given to polynomials of degree 1, as shown in Figs. 9b, 9c and 9d,
CR IP T
where 35, 37, and 41 features with non-zero coefficients are respectively presented. The selected basis functions and their coefficients are stable with more than 77 samples, indicating that the
algorithm is probably converged and the major stochastic features may be just these about 40 basis
AN US
functions. However, the coefficients trained by 50 samples behave strangely as shown in Fig. 10a. The basis matrix is 50 by 42, nearly square shaped and it is badly scaled with a condition number around 3,000. The coefficients trained in such condition are inconvincible and therefore the
M
statistic properties are highly biased. When 77 samples or more are utilized, the basis matrix is rectangular with a condition number no more than 100. Convincible regression results are usually
ED
obtained in such N kP, k [2,3] ideal condition. As a result, a significant improvement is
PT
achieved with just 27 more samples, and more samples added afterwards will make limited
AC
CE
improvement because the algorithm converges fast.
(a)
(b)
CR IP T
ACCEPTED MANUSCRIPT
(c)
(d)
Fig. 9 The coefficients of basis functions for the constructed response surface at location
4.2.3 Effect of large spatial variability
AN US
(5.0 [L],5.0 [L]) , trained by (a) 50, (b) 77, (c) 109 and (d) 137 samples.
M
As illustrated in the previous section, the sparse PCE method is accurate and efficient for moderate spatial variability of the hydraulic conductivity field. In this section, we examine cases with larger
AC
CE
PT
ED
variability to inspect the validity of this method.
(a)
(b)
(c)
CR IP T
ACCEPTED MANUSCRIPT
Fig. 10 Estimated head variance h2 along the cross sections y 5.0[L] with (a) Y2 2.0 , (b)
AN US
Y2 4.0 , and (c) corresponding estimated errors compared with MC.
We first increase the spatial variability to Y2 2.0 (corresponding to the coefficient of
M
variation of hydraulic conductivity CvKs Ks / Ks 253% ), while keeping other conditions
ED
unchanged. The MC benchmark and basis functions are generated in the same way as before. When constructing the sparse PCE response surface, the stopping criterion is set as simulated head
PT
variance over half the number of samples. The original sample size is still conservatively selected
CE
as 50, with 20 samples enriched every time. The estimated variances at y 5.0 [L] are shown in Fig. 10a. The estimated error as a function of sample size is denoted by the red line in Fig. 10c.
AC
We can see that even though the hydraulic conductivity field is of larger variability, the sparse PCE is still able to capture the random structure accurately with limited samples and maintain a rapid convergence speed, compared to the MC simulation. The sparse PCE is still highly efficient. However when horizontally comparing with the case of Y2 1.0 , the estimated error is larger and more samples are needed to obtain satisfying results. Let us make the spatial variability even larger, say Y2 4.0 (corresponding to the coefficient
ACCEPTED MANUSCRIPT
of variation of hydraulic conductivity CvKs 732% ). Since the random field is strongly fluctuated and complex, the construction of the response surface may start with 100 samples and 40 new samples are added every time. The stopping criterion is set much looser, which is the simulated head variance over 1/6 of the sample size. Under such circumstance, the estimated
CR IP T
variances and errors are shown in Figs. 10b and 10c, respectively. More than 200 samples are required to get good approximations and the estimated errors are even larger. Though the efficiency of the sparse PCE is effected by large variability, it remains robust compared to other
AN US
algorithms. Besides MC, in PCM and SCM, usually 3 or 4 degree polynomials have to be considered for such high spatial variability of the hydraulic conductivity field and the sample size may range from 103 to 104 [7, 26]. Since the sparse PCE attends to extract the most valuable
AC
CE
PT
ED
with such high variability.
M
information from limited samples and make the best use of them, decent results can be given even
(a)
(b)
Fig. 11 The coefficients of basis functions for the constructed response surface at location (5.0 [L],5.0 [L]) with (a) Y2 2.0 and (b) Y2 4.0 .
ACCEPTED MANUSCRIPT
The effect of variability can be also reflected in the coefficients of the response surface. Fig. 11 shows the coefficients for all the basis functions with spatial variability Y2 2.0 and
Y2 4.0 . By comparing Figs. 9 and 11, we can find that more features are activated with the variability increasing, however, the polynomials of degree 1 remain the dominated terms.
CR IP T
Specifically, for Y2 4.0 , some polynomials of degree 3 are also included, denoted by the green bars in Fig. 11b. The sparse PCE intends to select polynomials of higher degree to characterize the details in a more complex field, while the major stochastic features are still retained. Such
AN US
behaviors of the response surface are within our expectation.
4.3 Example of transient groundwater flow
M
4.3.1 Problem statement
In this section, we apply the sparse PCE algorithm to the transient groundwater flow problem.
ED
We consider the same flow domain as in the steady state problem. A pumping well is located at
PT
(5.0 [L],5.0 [L]) with a constant pumping rate of 3.0 [L3 / T] ( [T] is any consistent time unit).
The specific storage is Ss 0.0001 [1/ L] . The prescribed head at the left and right boundaries are
CE
27.0 [L] and 25.0 [L] , respectively. The hydraulic head is steady before pumping. The
AC
simulation time is 5.0 [T] with 20 time steps. The log hydraulic conductivity field is equipped with mean Y ln Ks 0.0 , variance Y2 1.0 and correlation length of the covariance
x y 4.0 [L] . MC simulations (with 100,000 samples) are performed to give an accurate
solution as the benchmark. The mean head and denary logarithm of head variance at t 0.5 [T], 1.25 [T], 2.5 [T] , obtained by MC simulations, are shown in Fig. 12.
ACCEPTED MANUSCRIPT
(b)
(c)
AN US
CR IP T
(a)
(d)
(e)
(f)
Fig. 12 Mean hydraulic head (upper row) and denary logarithm of head variance (lower row) at
M
(a)(d) t 0.5 [T] , (b)(e) t 1.25 [T] and (c)(f) t 2.5 [T] .
ED
4.3.2 Analysis of statistical properties
PT
Due to the presence of the pumping well, both mean and variance of the hydraulic head field become nonsmooth especially at the well, compared to the previous examples without pumping
CE
well. In the application of sparse PCE, we still start with 50 conservative samples, with 20 samples
AC
updated at each step, and the degree of polynomial is no more than 3. Along the cross section y 5.0 [L] , the estimated mean and variance of hydraulic head with sparse PCE trained by 264
samples are compared with MC (with 100,000 samples) as shown in Fig 13.
ACCEPTED MANUSCRIPT
(b)
(c)
AN US
CR IP T
(a)
(d)
(e)
(f)
Fig. 13 Estimated mean (upper row) and variance (lower row) of the hydraulic head on the cross
M
section y 5.0 [L] at (a)(d) t 0.5 [T] , (b)(e) t 1.25 [T] and (c)(f) t 2.5 [T] .
ED
It is obvious that the mean of hydraulic head from sparse PCE exactly matches the MC
PT
results at different time. Besides, the variance also agrees well with the MC solutions, while slight deviations are observed at the location of the well. Both the computational efficiency and accuracy
CE
of the sparse PCE method, which has been observed in steady state problems, are inherited to
AC
transient flow problems. The behaviors of PDFs for hydraulic head can be checked, as shown in Fig. 14.
ACCEPTED MANUSCRIPT
(b)
(c)
CR IP T
(a)
Fig. 14 PDFs of hydraulic head at location: (a) (4.5 [L],5.0 [L]) , (b) (5.0 [L],5.0 [L]) at the pumping well, and (c) (6.0 [L],5.0 [L]) at t 1.25 [T] , obtained from spares PCE and MC.
AN US
The PDFs at the pumping well and two locations nearby are shown in Fig. 14, which are
obtained by running 100,000 MC simulations on the response surface. The estimations agree well with the references, indicating that the random structure of the hydraulic head is captured by
M
sparse PCE. However the estimated PDF at the well is slightly biased. Since a drawdown funnel
ED
exists (Fig. 12) to ensure a constant pumping rate, the hydraulic head at the well is physically constrained to be lower than 26.0 [L] as shown in the reference PDF. The behavior of the well
PT
head is nonlinear in the probability space and it is difficult to be accurately characterized by
CE
polynomials. Therefore in the estimated PDF, we may find a few unphysical realizations with heads larger than 26.0 [L] . Even so, the estimations are slightly affected and still acceptable
AC
because nonlinear phenomenon is not that obvious in this single phase case. But for multiphase flow problems where nonlinearity is quite common, some modifications have to be made for the sparse PCE method to give an accurate description of nonlinearity. This problem will be studied in our further research.
ACCEPTED MANUSCRIPT
5 Conclusions In this study, we explore an efficient method to construct a sparse polynomial chaos expansion (PCE) response surface with the aid of feature selection algorithm. This method is utilized for uncertainty quantification (UQ) of flow in random porous media. The input random
CR IP T
field, i.e., the hydraulic conductivity field, is represented by the Karhunen-Loève (KL) expansion and the dependent random field, i.e., the hydraulic head field, is estimated by the PCE response surface. Feature selection algorithm is applied to select major stochastic features among
AN US
tremendous basis functions, so that the polynomial chaos coefficients can be calculated with limited realizations. The cross validation (CV) error is adopted to evaluate each feature set.
Besides, the response surface is improved by sequentially enriching the training samples with the
M
inherited Latin hypercube sampling (LHS) method. We first validate the algorithm with an analytical example to give an illustration of the feature selection process. The sparse PCE method
ED
is then applied to two-dimensional steady state flow problems with different spatial variability.
PT
The method is also implemented in a transient flow problem with a pumping well and the results are compared to Monte Carlo (MC) method. This study leads to the following conclusions:
CE
1. With relatively small computational efforts, the sparse PCE method is feasible for
AC
accurately quantifying uncertainty for flow in random porous media. The method is nonintrusive and highly parallelizable, which can be easily implemented with existing simulators. Feature selection method is utilized for basis reduction, and the required number of simulations is reduced. No matter how time-consuming the flow simulator is, the computational effort of feature selection is comparatively small, since the selection process doesn’t involve simulation and it is intrinsically a linear fitting problem. Thus the total computational cost is reduced and the method is highly
ACCEPTED MANUSCRIPT
efficient. 2. The convergence rate of the sparse PCE method is faster than the MC method when there are a reasonable number of effective features of the random field. The sparse PCE method intends to construct response surfaces without those irrelevant and redundant features. Therefore good
CR IP T
approximations can be obtained with a limited number of samples, even if the spatial variability is large and high dimension is considered. However, for those extremely complicated processes, where a large number of effective features exist, the advantage of sparse PCE may diminish.
AN US
3. The sparse PCE method is sequential and self-adaptive. The response surface is gradually improved by training more samples sequentially. Since new samples are generated hereditarily, the results from the original samples are still useful for new response surface construction. In addition,
M
samples generated by the inherited LHS method are of good statistical properties. The response surface trained by these well distributed samples is convincible.
ED
4. The feature selection and assessment modules in the sparse PCE method are replaceable.
PT
Though in this paper, the LASSO-LAR algorithm and the leave-one-out cross validation (CV) assessment are introduced, many outstanding algorithms and tools in statistical learning area can
CE
be applied for a better performance. The sparse PCE method is quite flexible and promising for
AC
further improvement.
5. It should be noted the sparse PCE method is not restricted to the flow type and
configuration considered in this study. The approach may be applied to complex flow configurations and multiphase flow problems, which is our ongoing study. However, for multiphase problems where high nonlinearity (or irregularity) exists, the fast convergence of sparse PCE may not hold true, since polynomials may be poor for describing highly nonlinear
ACCEPTED MANUSCRIPT
phenomena. This requires further research in the future.
References [1] Arnst M, R Ghanem, E Phipps, J Red‐Horse. Reduced chaos expansions with random
CR IP T
coefficientsin reduced‐dimensional stochastic modeling of coupled problems. International Journal for Numerical Methods in Engineering. 97 (2014) 352-76.
[2] Arnst M, R Ghanem, C Soize. Identification of Bayesian posteriors for coefficients of chaos
AN US
expansions. Journal of Computational Physics. 229 (2010) 3134-54.
[3] Babuška I, F Nobile, R Tempone. A stochastic collocation method for elliptic partial differential equations with random input data. SIAM Journal on Numerical Analysis. 45 (2007)
M
1005-34.
[4] Ballio F, A Guadagnini. Convergence assessment of numerical Monte Carlo simulations in
ED
groundwater hydrology. Water resources research. 40 (2004).
PT
[5] Blatman G, B Sudret. An adaptive algorithm to build up sparse polynomial chaos expansions for stochastic finite element analysis. Probabilistic Engineering Mechanics. 25 (2010) 183-97.
CE
[6] Blatman G, B Sudret. Adaptive sparse polynomial chaos expansion based on least angle
AC
regression. Journal of Computational Physics. 230 (2011) 2345-67. [7] Chang H, D Zhang. A comparative study of stochastic collocation methods for flow in spatially correlated random fields. Communications in Computational Physics. 6 (2009) 509.
[8] Choi S-K, RV Grandhi, RA Canfield, CL Pettit. Polynomial chaos expansion with latin hypercube sampling for estimating response variability. AIAA journal. 42 (2004) 1191-8. [9] Dai C, H Li, D Zhang. Efficient and accurate global sensitivity analysis for reservoir
ACCEPTED MANUSCRIPT
simulations by use of probabilistic collocation method. SPE Journal. 19 (2014) 621-35. [10] Dash M, H Liu. Feature selection for classification. Intelligent data analysis. 1 (1997) 131-56. [11] Doostan A, H Owhadi. A non-adapted sparse approximation of PDEs with stochastic inputs. Journal of Computational Physics. 230 (2011) 3015-34.
CR IP T
[12] Efron B, T Hastie, I Johnstone, R Tibshirani. Least angle regression. The Annals of statistics. 32 (2004) 407-99.
[13] Elsheikh AH, I Hoteit, MF Wheeler. Efficient Bayesian inference of subsurface flow models
AN US
using nested sampling and sparse polynomial chaos surrogates. Computer Methods in Applied Mechanics and Engineering. 269 (2014) 515-37.
[14] Friedman J, T Hastie, R Tibshirani. The elements of statistical learning. Springer series in
M
statistics Springer, Berlin, 2001.
[15] Ghanem R. Scales of fluctuation and the propagation of uncertainty in random porous media.
ED
Water Resources Research. 34 (1998) 2123-36.
2003.
PT
[16] Ghanem RG, PD Spanos. Stochastic finite elements: a spectral approach. Courier Corporation,
CE
[17] Ghiocel DM, RG Ghanem. Stochastic finite-element analysis of seismic soil-structure
AC
interaction. Journal of Engineering Mechanics. 128 (2002) 66-77. [18] Golub GH, M Heath, G Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics. 21 (1979) 215-23.
[19] Guyon I, A Elisseeff. An introduction to variable and feature selection. Journal of machine learning research. 3 (2003) 1157-82. [20] Hampton J, A Doostan. Compressive sampling of polynomial chaos expansions: Convergence
ACCEPTED MANUSCRIPT
analysis and sampling strategies. Journal of Computational Physics. 280 (2015) 363-86. [21] James G, D Witten, T Hastie, R Tibshirani. An introduction to statistical learning. Springer, 2013. [22] Le Maître O, OM Knio. Spectral methods for uncertainty quantification: with applications to
CR IP T
computational fluid dynamics. Springer Science & Business Media, 2010. [23] Le Maı̂ tre OP, MT Reagan, HN Najm, RG Ghanem, OM Knio. A stochastic projection
method for fluid flow: II. Random process. Journal of computational Physics. 181 (2002) 9-44.
AN US
[24] Leube PC, FP De Barros, W Nowak, R Rajagopal. Towards optimal allocation of computer
resources: Trade-offs between uncertainty quantification, discretization and model reduction. Environmental modelling & software. 50 (2013) 97-107.
M
[25] Li H, P Sarma, D Zhang. A comparative study of the probabilistic-collocation and
16 (2011) 429-39.
ED
experimental-design methods for petroleum-reservoir uncertainty quantification. SPE Journal.
PT
[26] Li H, D Zhang. Probabilistic collocation method for flow in porous media: Comparisons with other stochastic methods. Water Resources Research. 43 (2007).
CE
[27] Li H, D Zhang. Stochastic representation and dimension reduction for non-Gaussian random
AC
fields: review and reflection. Stochastic environmental research and risk assessment. 27 (2013) 1621-35.
[28] Liao Q, D Zhang. Constrained probabilistic collocation method for uncertainty quantification of geophysical models. Computational Geosciences. 19 (2015) 311-26. [29] Moslehi M, R Rajagopal, FP de Barros. Optimal allocation of computational resources in hydrogeological models under uncertainty. Advances in Water Resources. 83 (2015) 299-309.
ACCEPTED MANUSCRIPT
[30] Oladyshkin S, F de Barros, W Nowak. Global sensitivity analysis: a flexible and efficient framework with an example from stochastic hydrogeology. Advances in Water Resources. 37 (2012) 10-22. [31] Oladyshkin S, W Nowak. Data-driven uncertainty quantification using the arbitrary
CR IP T
polynomial chaos expansion. Reliability Engineering & System Safety. 106 (2012) 179-90. [32] Saeys Y, I Inza, P Larrañaga. A review of feature selection techniques in bioinformatics. bioinformatics. 23 (2007) 2507-17.
AN US
[33] Tatang MA, W Pan, RG Prinn, GJ McRae. An efficient method for parametric uncertainty
analysis of numerical geophysical models. Journal of Geophysical Research: Atmospheres. 102 (1997) 21925-32.
M
[34] Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological). (1996) 267-88.
ED
[35] Wang GG. Adaptive response surface method using inherited latin hypercube design points.
PT
Journal of Mechanical Design. 125 (2003) 210-20. [36] Webster MD, MA Tatang, GJ McRae. Application of the probabilistic collocation method for
CE
an uncertainty analysis of a simple ocean model. Technical report, MIT Joint Program on the
AC
Science and Policy of Global Change Reports Series No. 4, Massachusetts Institute of Technology. (1996).
[37] Wiener N. The homogeneous chaos. American Journal of Mathematics. 60 (1938) 897-936. [38] Xiu D, JS Hesthaven. High-order collocation methods for differential equations with random inputs. SIAM Journal on Scientific Computing. 27 (2005) 1118-39. [39] Xiu D, GE Karniadakis. Modeling uncertainty in steady state diffusion problems via
ACCEPTED MANUSCRIPT
generalized polynomial chaos. Computer methods in applied mechanics and engineering. 191 (2002) 4927-48. [40] Xiu D, GE Karniadakis. The Wiener--Askey polynomial chaos for stochastic differential equations. SIAM journal on scientific computing. 24 (2002) 619-44.
Journal for Uncertainty Quantification. 2 (2012).
CR IP T
[41] Yan L, L Guo, D Xiu. Stochastic collocation algorithms using L1-minimization. International
[42] Yang Y, JO Pedersen. A comparative study on feature selection in text categorization.
AN US
ICML1997. pp. 412-20.
[43] Zhang D. Stochastic methods for flow in porous media: coping with uncertainties. Academic press, 2001.
M
[44] Zhang D, Z Lu. An efficient, high-order perturbation approach for flow in random porous
AC
CE
PT
194 (2004) 773-94.
ED
media via Karhunen–Loeve and polynomial expansions. Journal of Computational Physics.