An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection

Accepted Manuscript An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection ...

Download PDF

2MB Sizes 5 Downloads 23 Views

Report

PDF Reader
Full Text

Accepted Manuscript

An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection Jin Meng , Heng Li PII: DOI: Reference:

S0309-1708(16)30625-X 10.1016/j.advwatres.2017.04.019 ADWR 2834

To appear in:

Advances in Water Resources

Received date: Revised date: Accepted date:

7 November 2016 26 February 2017 25 April 2017

Please cite this article as: Jin Meng , Heng Li , An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection, Advances in Water Resources (2017), doi: 10.1016/j.advwatres.2017.04.019

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

Highlights  An efficient non-intrusive stochastic approach for flow in random porous media  A small number of samples used to construct sparse polynomial chaos expansion by feature selection  The algorithm is self-adaptive by adopting Inherited samples and cross validation (CV) error analysis

ACCEPTED MANUSCRIPT

An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection Jin Meng and Heng Li Department of Energy and Resources Engineering, College of Engineering,

CR IP T

Peking University, Beijing 100871, China Abstract

An efficient method for uncertainty quantification for flow in porous media is studied in this

AN US

paper, where response surface of sparse polynomial chaos expansion (PCE) is constructed with the aid of feature selection method. The number of basis functions in PCE grows exponentially as the random dimensionality increases, which makes the computational cost unaffordable in

M

high-dimensional problems. In this study, a feature selection method is introduced to select major stochastic features for the PCE by running a limited number of simulations, and the resultant PCE

ED

is termed as sparse PCE. Specifically, the least absolute shrinkage and selection operator modified

PT

least angle regression algorithm (LASSO-LAR) is applied for feature selection and the selected features are assessed by cross-validation (CV). Besides, inherited samples are utilized to make the

CE

algorithm self-adaptive. In this study, we test the performance of sparse PCE for uncertainty

AC

quantification for flow in heterogeneous media with different spatial variability. The statistical moments and probability density function of the output random field are accurately estimated through the sparse PCE, meanwhile the computational efforts are greatly reduced compared to the Monte Carlo method. Keywords：Uncertainty quantification; Karhunen-Loève expansion; sparse polynomial chaos expansion; feature selection; LASSO-LAR

ACCEPTED MANUSCRIPT

1 Introduction Uncertainty quantification (UQ) of subsurface flow in porous media has received much attention in recent years. Although geological formations are intrinsically deterministic, some formation properties are considered to be stochastic owing to high degree of heterogeneity and

CR IP T

incomplete knowledge of the formations. Therefore the prediction of subsurface flow often involves some degree of uncertainty. A stochastic description of formation parameters are necessary for UQ, which leads to stochastic partial differential equations (SPDE) governing flow

AN US

and transport [43].

Monte Carlo (MC) method is one of the most common approaches for solving SPDE numerically, where a large number of realizations are randomly generated and the statistical

M

properties of model responses can be obtained from sampling. The accuracy of MC method is highly dependent on the sample size one chooses [4]. A large number of samples are usually

ED

required through the MC, especially for quantifying the high order statistical moments or the full

PT

probability density function (PDF). The computational cost may be alleviated with the rise of

parallel computing and high performance computers. Besides, other stochastic approaches such

CE

as the moment equation method or perturbation approach [43, 44] can be used for quantifying

AC

uncertainty. But these methods also suffer from high computational cost when the problem is complicated. An efficient alternative, the polynomial chaos expansion (PCE) has found extensive use in

uncertainty quantification in the last two decades [15, 37, 39]. In the PCE method, the random output is represented by a series of orthogonal polynomial basis functions in terms of the independent input random variables [16]. The orthogonal polynomials are with respect to the

ACCEPTED MANUSCRIPT

distributions of the input random variables. Hermite PCE is specified for Gaussian random variables. In addition, generalized PCE is developed for input random variables with different distributions [39]. If the random inputs are correlated or characterized by random field, parameterization methods such as Karhunen-Loève (K-L) expansion can be used to transform

CR IP T

them to independent random variables [26, 27]. Once the PCE coefficients are obtained, the statistical properties of the random output can be easily evaluated. The PCE can be used for UQ for arbitrary input distributions [25, 31], and has been applied for global sensitivity analysis [9,

AN US

30]. Considerable work has been accomplished in this field. One of the best-known, the stochastic finite element method is pioneered by Ghanem and Spanos [16], which has been applied in many areas [15, 17, 23]. The method is termed as intrusive because it requires manipulation of the

M

governing equations. Discretized equations for PCE coefficients are derived with the Galerkin scheme and a set of coupled deterministic equations have to be solved. Arnst et al. further

ED

improved the method with the reduced chaos expansion [1]. Besides, the Bayesian methods can

PT

also be applied to identify the unknown coefficients [2]. In contrast, non-intrusive methods do not require manipulation of the governing equations.

CE

They allow using the existing deterministic simulators, which can be treated as a black box. Such

AC

methods can be roughly categorized into two types [41]. One is based on interpolation, where the simulation samples are interpreted by the approximation precisely, e.g., the stochastic collocation method (SCM) [3] and the probabilistic collocation method (PCM) [33]. The former uses Lagrange interpolation points as collocation points, while the latter uses the points of numerical evaluation of the Galerkin integral [7, 36]. The other type of non-intrusive method is based on regression, where the simulation samples are approximately matched. A larger size of collocation

ACCEPTED MANUSCRIPT

points than the number of basis functions is required to guarantee the matrix being well-conditioned for calculation [5, 6]. These approaches are found to be accurate and computationally efficient for problems with relatively low random dimensionality [7, 26, 28]. However, the number of collocation points is closely related to the random dimensionality

CR IP T

and the degree of polynomials [38]. Since each collocation point represents a full-scale deterministic simulation, the computational cost increases when the random dimensionality is high. This so-called “curse of dimension” phenomenon is drawing more and more attentions in recent

AN US

years. Some dimension reduction methods have been proposed to overcome the problem. The idea of compressed sensing has been introduced for UQ, where the sparsity of the random response functions is recovered by L1-norm [11, 20, 41]. In addition, some researchers intended to

M

construct sparse PCE response surfaces by regression techniques. Blatman and Sudret utilized the stepwise regression [5] and least angle regression (LAR) [6] for stochastic finite element problems.

ED

Elsheikh et al. applied LAR for Bayesian inference of permeability field in the subsurface [13]. In

PT

fact, both the stepwise regression and LAR are common algorithms in this field of feature selection [14, 19]. Feature selection was first proposed in the area of statistical learning for

CE

removing redundant and irrelevant features. A mature system for feature selection has been

AC

established, where extensive algorithms and successful applications have been published in the last two decades [12, 14, 21, 34]. In this study, the sparse PCE constructed by feature selection is introduced for UQ for flow in

random porous media. The input random field of hydraulic conductivity is parameterized by the K-L expansion and the output random field of hydraulic head is represented by PCE. Our objective is to accurately and efficiently quantifying uncertainty for the hydraulic head with the

ACCEPTED MANUSCRIPT

sparse PCE. A non-intrusive stochastic approach is adopted and the PCE coefficients are derived by regression. In our problem, there may exist high random dimensionality associated with the K-L expansion [44], thus the total number of basis functions ( P ) will be much larger than the number of simulations ( N ) one can afford, i.e., 𝑁 ≪ 𝑃. By constructing sparse PCE with the

CR IP T

help of feature selection, the problem will be converted to the ordinary N  P type. In this way, higher accuracy and efficiency can be reached with a limited number of simulations. Specifically, the least absolute shrinkage and selection operator (LASSO) modified least angle regression

AN US

algorithm (termed as LASSO-LAR) [34] is utilized to select major stochastic features, which is not that greedy as stepwise regression and is highly efficient for high dimensional problems. In

our numerical examples, the proposed method will be compared with MC for both accuracy and

M

efficiency.

The remainder of this paper is organized as follows. In Section 2, the governing equations of

ED

flow and stochastic representations of random fields are mainly discussed. In Section 3, we

PT

introduce the feature selection method, specifically the LASSO-LAR algorithm, and its validity assessments. Besides, its implementation with inherited samples is also outlined. Following an

CE

illustrative analytical example, some numerical examples for both steady state and transient flow

AC

are given in Section 4. The conclusions are given in Section 5.

2 Stochastic Representations 2.1 Governing equations We consider transient flow in saturated porous media satisfying the following continuity equation and Darcy’s law:

ACCEPTED MANUSCRIPT

Ss

h(x, t )    q(x, t )  g (x, t ), t

(1)

q(x, t )  Ks (x)h(x, t ),

(2)

subject to initial and boundary conditions:

x  D,

(3)

h(x, t )  H (x, t ),

x D,

(4)

q(x, t )  n( x)  Q(x, t ),

x N ,

CR IP T

h(x,0)  H 0 (x),

(5)

where h(x, t ) is hydraulic head, S s is the specific storage, q(x, t ) is the specific discharge, g (x, t ) is

domain D 

d

AN US

the source (or sink) term, and K s (x) is the hydraulic conductivity. H 0 (x) is the initial head in

, H (x, t ) is the prescribed head on Dirichlet boundary segments  D , Q(x, t ) is

the prescribed flux across Neumann boundary segments  N , and n( x)  (n1 ,..., nd )T is an outward

N .

M

unit vector normal to the boundary    D

In this study, K s (x) is assumed as a log-normal random field, thus Eqs. (1)-(5) become

ED

stochastic partial differential equations, whose solutions are no longer deterministic values but

PT

probability distributions or related statistical moments. We aim to efficiently and accurately estimate the statistical properties of the hydraulic head in terms of the statistical moments of log

AC

CE

transformed K s (x) .

2.2 Representation of hydraulic conductivity random field Let Y (x, )  ln[ Ks (x, )] be a random field, where x  D,   (  is the event space of a

complete probability space (, , ) ,

is a σ-algebra of  and

is a probability measure). One

may write Y (x, )  Y (x)  Y (x, ) , where Y (x) is the mean and Y (x, ) is the fluctuation. The spatial structure of the random field is described by the covariance CY (x, x)  Y (x, )Y (x, ) .

ACCEPTED MANUSCRIPT

Since the covariance is bounded, symmetric and positive-definite, it may be decomposed as [16]: 

CY (x, x)   i fi (x) fi (x),

(6)

i 1

where i and fi (x) are eigenvalues and eigenfunctions respectively. They can be solved from the



D

CY (x, x) f (x)dx   f (x).

CR IP T

following second type Fredholm equation: (7)

Then the random field Y (x, ) is expressed by the Karhunen-Loève (K-L) expansion as: 

Y (x, )  Y (x)   i fi (x)i ( ).

AN US

i 1

(8)

If Y (x, ) is a Gaussian random field, i ( ) are orthogonal Gaussian random variables with zero mean and unit variance, i.e., i ( )  0 , i ( ) j ( )   ij .

From Eq. 6 or 8, we have  Y2   i 1 i fi 2 (x) , and it yields D Y2   i 1 i after integrating, 

M



where D is a measure of the physical domain size (length, area or volume for 1D, 2D or 3D

ED

domain, respectively). In practical applications, Eq. 8 has to be truncated into finite terms for

PT

calculation. The number of truncated terms is determined by the convergence rate of i summation. Typically, n terms are retained if



n i 1

i is relatively close to the total variance

CE

D Y2 . As a result, the random field is governed by a n -dimensional random vector

AC

(1 ( ),..., n ( ))T . Note that the random dimensionality of the problem is equivalent to the

dimensionality of the random vector. A slow convergence rate is often detected in a multi-dimensional physical domain and as a result, the analysis of uncertainty has to be implemented in a high-dimensional random space which usually leads to large computational cost [44].

ACCEPTED MANUSCRIPT

2.3 Representation of hydraulic head random field Since the input field K s (x, ) is random and characterized by ( )  (1 ( ),..., n ( ))T , the output hydraulic head field h(x, t , )  M (x, t , ( )) is also random, where M stands for the physical model governing the flow. The analysis we made hereafter are all for fixed time and

CR IP T

space, thus the physical measure (x, t ) will be omitted for simplicity. The random structure of output field cannot be represented by the K-L expansion as the covariance is yet to be found. As an alternative, the polynomial chaos expansion (PCE) technique is applied here to construct a

AN US

response surface to approximate the output. The output hydraulic head field is decomposed into a summation of a series of polynomials in terms of the independent random variables  [40]: h( ) 

a

 



(( )),

(9)

n

where a are unknown PCE coefficients and  are multivariate polynomial basis functions,

M

which are orthonormal with respect to the joint probability density function (PDF) p () :

ED

 (), ()    () () p ()d    . 

(10)

PT

The full PDF is required to obtain basis functions. In some practical applications, only low order statistical moments or discrete PDFs are available. With the limited information, one has to make

CE

some reasonable assumptions about the full distribution of the random variables.

AC

To construct the multivariate basis functions  , one may start from the univariate basis

functions with respect to the marginal PDF pi (i ) . Consider a family of orthonormal polynomials { (ji ) , j  } with respect to pi (i ) in one-dimension:

 (ji ) (i ), k(i ) (i )   jk , where j, k 

(11)

stands for the degree of polynomials. Basis functions in one-dimensional space

are much more convenient to derive and specific optimal polynomial families have been found for

ACCEPTED MANUSCRIPT

some common distributions, e.g., Hermite polynomials for Gaussian random variables, and Legendre polynomials for uniform random variables, etc. As for arbitrary input distributions, the PCE can be constructed numerically according to specific distributions [25, 31, 39]. In addition, we have p ()   i 1 pi (i ) since the elements in  are independent to each n

CR IP T

other. According to Eq. 10, the multivariate basis functions can be constructed by tensor products of the univariate basis functions: n

 ()=(i ) (i ), n

is a multi-index with norm    i 1i , whose elements 1 , n

degree of polynomial in each dimension.

AN US

where  

(12)

i

i 1

, n indicate the

Typically, basis functions are selected from truncated polynomial spaces, so that a finite

M

number of terms will be retained. In this paper, a total degree polynomial space is considered,

ED

where the summation of n -dimension polynomial degrees is no more than an integer d . The truncated PCE up to degree d is given as:

PT

hd ( )   a (( )),

where  is a multi-index set of  defined by  { 

CE

(13)



n

:   d} . The total number of basis

AC

function P is determined by the random dimensionality n and the polynomial degree d :

 n  d  (n  d )! P .  n !d ! d 

(14)

Once the PCE coefficients are evaluated, in consideration of the orthogonality of the basis

function, statistical moments of the output field such as mean and variance can be derived directly from Eq. 13:

h  aα=0 ,  h2 



{ \ 0}

a2 .

(15)

ACCEPTED MANUSCRIPT

Or the Monte Carlo simulation can be performed on the PCE (Eq. 13) as a post processing to get the statistical moments and PDF, which is computationally efficient for the polynomial surrogate model.

CR IP T

2.4 Evaluation of PCE coefficients To evaluate the PCE coefficients, one first samples a set of collocation points

{(1 ),

, ( N )} , where N is the number of samples. The sampling method will be discussed in

AN US

section 3.4. On each collocation point (i )  (1 (i ),..., n (i ))T , a deterministic flow problem is solved and as a result, N realizations of hydraulic head {h(1 ),

, h( N )} are generated. On the

other hand, the output field is approximated by the PCE as shown in Eq. 13, thus an equation set

P

M

of PCE coefficients is obtained:

 a  (( )) h( ), j

j

i

i

i  1,

(16)

, N.

ED

j 1

Note that every subscript j in Eq. 16 corresponds to a specific multi-index  in Eq. 13.

a  h,

CE

PT

We rewrite the equation set in the form of matrix:

where a  (a1 ,

, aP )T is the vector of coefficients, h = (h(1 ),

(17) , h( N ))T is the vector of

AC

model realizations, and N  P is a Vandermonde-like matrix of basis functions whose elements are ij   j ((i )), i  1,

, N , j  1, , P .

We consider solving the equation set with regression method, where the simulation samples are approximately matched on some unstructured points. The PCE coefficients estimated by ordinary least-square (OLS) regression are given below, where the residual summation of squares is minimized:

ACCEPTED MANUSCRIPT aOLS  arg min (h  a)T (h  a)  (T ) 1 T h.

(18)

A well-conditioned matrix T  is required, otherwise numerical oscillations may be detected in calculation and the estimations may be inconvincible. To ensure numerical stability, the number of realizations N should be greater than the number of basis functions P , where

CR IP T

N  kP, k [2,3] is usually recommended [6]. As indicated in Eq. 14, the number of basis

function is extremely large when the random dimensionality or the polynomial degree is high.

While in practical applications, only a limited number of realizations are affordable, i.e.,  is a

AN US

matrix of 𝑁 ≪ 𝑃 size, which will lead to a strongly ill-conditioned equation set and make great trouble for coefficient evaluations. Therefore we aim to reduce the size of basis function and construct a sparse PCE response surface for the UQ problem. The feature selection method is

ED

M

utilized to tackle this problem and it is detailed in next section.

3 Feature Selection for basis reduction

PT

3.1 Introduction to Feature Selection

CE

Feature selection, also known as variable selection or attribute selection, is a technique to select subsets of relevant features for model constructions. It was originally proposed in statistical

AC

learning area and has been applied to many fields in recent years, e.g., bioinformatics, text categorization, signal processing and etc. [19, 32, 42]. In these applications, selection methods are adopted to improve prediction performance and provide a better understanding for datasets with hundreds or thousands redundant variables and comparably few training examples.

CR IP T

ACCEPTED MANUSCRIPT

Fig. 1. Feature selection process with validation [10].

AN US

A general process of feature selection is shown in Fig. 1. A candidate feature subset is

generated at first. Then the model constructed by the subset is examined by an evaluation function to assess the goodness for prediction. If the requirement is not met, new subsets are generated and

M

the previous steps are repeated. Once a stopping criterion is reached, the subset is qualified and its validity for general cases will be appraised at last.

ED

The generation algorithms may be categorized into three types, exhausted, heuristic and

PT

random search method, and the evaluation algorithms are categorized into filter, wrapper and embedded types. By combining different generation and evaluation methods, considerable

CE

algorithms have been proposed in the last two decades, e.g., Fisher Score, sequential forward

AC

selection, regularization model and etc. [10, 19]. In previous works, the stepwise regression and least angle regression (LAR) have been applied for UQ problems [5, 6, 13]. In the following text, we will introduce another popular algorithm in statistical learning, i.e., LASSO-LAR, whose implementation for PCE basis reduction is quite promising.

3.2 Least Absolute Shrinkage and Selection Operator (LASSO)

ACCEPTED MANUSCRIPT

We still focus on the regression problem of PCE coefficients in Eqs.16 and 17. From the aspect of statistics, each basis function in the PCE may be viewed as a specific feature of the output random field and they are weighted by the coefficients a  (a1 , the basis matrix N P  (Φ 1 ,

, aP )T . Correspondingly,

,Φ P ) represents a collection of N samples with P features,

CR IP T

which forms the training dataset. Each column vector represents a specific feature and each row vector represents a realization. Realizations of the output field h  (h1 , targets and the model predictions are denoted by hˆ  a  (hˆ1 ,

, hN )T are the model

, hˆN )T . The relationships

AN US

between the features and targets are established on training dataset, indicated by Eqs. 16 and 17. Such relationships may be extended to general cases for prediction with arbitrary samples.

OLS regression aims to find the weights which minimize the loss function, i.e., the L2-norm of residual between model targets and predictions:

,

M



2

N

P

i 1

j 1

h  hˆ   (hi   a j ij ) 2 . 2

(19)

ED

aOLS  arg min h  hˆ

Every feature is given a non-zero weight in OLS regression, implying that the output is interpreted

PT

by all features. However, there always exist some redundant and irrelevant features, whose

CE

weights should be strictly set to zero. The LASSO, proposed by Tibshirani [34], discards these

AC

features by setting a constraint to the coefficients:



a LASSO  arg min h  hˆ

2

,

s.t. a 1  t,

(20)

where a 1   j 1 a j  t is the L1-norm penalty term. The coefficients are restricted to both the P

loss function and penalty function. An equivalent Lagrangian form of Eq. 20 is:





a LASSO  arg min h  hˆ   a 1 , 2

(21)

where   0,   is the Lagrange factor and a one-one mapping exists between t and  . By making t sufficiently small or  sufficiently large, some of the coefficients will be exactly zero,

ACCEPTED MANUSCRIPT

thus a kind of continuous feature selection is performed. When the penalty term becomes L2-norm (resp. L1-L2-combined norm), the LASSO may turn to ridge (resp. elastic-net) regression method [14]. Theoretically the constraints of non-zero coefficients should be defined by a L0-norm penalty

CR IP T

term, where a 0  #k : ak  0  t is considered. The L0-norm penalty is a “hard-thresholding” requiring the size of non-zero coefficients to be strictly limited to t . Although it is straightforward, the optimal solution is difficult to find, which is obviously a NP problem. As an

AN US

alternative, LASSO may do the same selection job with less calculation difficulties, because of the fact that L1-norm is the optimal convex approximation to L0-norm [34]. Therefore the LASSO algorithm can be used for feature selection, though it is intrinsically a shrinkage method.

M

Optimization algorithms can be applied to solve LASSO coefficients in Eq. 21, e.g., (block) coordinate descent, proximal method and etc. [14]. However, they may be computationally

ED

expensive when a large number of features exist. An efficient alternative is to perform LASSO

PT

with the help of least angle regression (LAR) [12]. LAR is a heuristic searching method for feature selection. Although different form LASSO in theory, it can be used to efficiently compute the

CE

entire solution path of LASSO. The LASSO-LAR algorithm is quite efficient, especially for high

AC

dimensional problems when 𝑁 ≪ 𝑃. The procedures of LASSO-LAR are given below [14].

LASSO-LAR Algorithm: 1.

In preparation, each column vector of features in the training dataset is normalized and the

target values are centered, i.e.,

1 N 1 N 2 ij  0,   ij  1, j  1, N i 1 N i 1

, P and

1 N  hi  0 . N i 1

ACCEPTED MANUSCRIPT

Initialize with the residual r  h, and set a1 ,

, aP  0 .

2.

Find the feature vector Φ j most correlated with r .

3.

Move a j from 0 towards its least-square coefficient Φ j , r , until some other competitor Φ k has as much correlation with the current residual as does Φ j . Move {a j , ak } in the direction defined by their joint least-square coefficient of the current

CR IP T

4.

residual on (Φ j ,Φ k ) , until some other competitor Φ l has as much correlation with the current residual.

If a non-zero coefficient hits zero, drop its feature from the active set of features and

AN US

5.

recompute the current joint least-square direction.

Continue in this way until min( N  1, P) features have been entered.

be the active feature set and a LAR be the k

M

6.

At the beginning of the k -th step, let

k

ED

 a LAR   ( k ) w( k ) , coefficient vector. In each loop, the coefficients are updated with the form a LAR k k 1

PT

where w ( k ) is the unit equiangular vector of current active feature set and  ( k ) is the updating step along direction w ( k ) . The coefficients proceed in a direction which makes the smallest and

CE

equal angle with each of the active feature vector, hence the algorithm is termed as least angle.

AC

The LASSO-LAR algorithm is quite similar to LAR except for the major difference in step 5. In LAR, the selected features are not dropped even if they are improperly recognized in previous steps. In LASSO-LAR, the features are assessed according to the behavior of coefficients. Detailed proofs and algebra derivation of  and w are shown in [12] and they will not be discussed here. For the convenience of assessment (see Section 3.3), we apply a hybrid LAR algorithm to

ACCEPTED MANUSCRIPT

estimate the PCE coefficients. This method is an adaption of ordinary LAR, where the selection task is accomplished by LAR and the estimation by OLS. Once the active feature set

k

is

determined, the hybrid LAR coefficients are derived in an OLS way:



ahybk  arg min h   k a

2



(22)

CR IP T

 (Tk  k )1 Tk h.

The hybrid LAR will give similar results as ordinary LAR when some major features are involved. The LASSO-LAR algorithm is highly efficient because it does not cope with optimization problems directly, but proceeds on an optimal solution path step by step. In each loop of the

AN US

algorithm, a new model formed by current active feature subset is generated. As the LASSO-LAR gives few assessments to these candidate feature subsets, assessments will be made with the aid of

ED

3.3 Assessments of feature subset

M

certain criteria, and finally the best subset will be applied for model construction.

The performance of a feature subset is related to its prediction capability on independent test

PT

datasets. In general cases, the goodness of fit may be quantified by some evaluation functions,

CE

among which the most commonly used include determination coefficient R 2 , Mallows’ C p , Akaike information criterion (AIC) and Bayesian information criterion (BIC) [21]. In this paper,

AC

we employ a cross validation (CV) method for subset assessment, with which the quantitative error estimation can be obtained. After k steps of LAR, the current model is constructed by the feature subset corresponding model prediction is hˆ

 k  ( ,Φ i , ), i 

k

k

  k a LAR  (hˆ k

k

,1

,

, hˆ

k

,N

k

, and the

)T , where

is the current training data. The empirical error, also referred to as the

training error, is defined as:

ACCEPTED MANUSCRIPT

Erremp 

1 N  (hi  hˆ N i 1

k

,i

)2 .

(23)

Empirical error is a straightforward description of loss between predictions and targets over training samples. Nevertheless it is not a proper criterion as it consistently decreases with

CR IP T

increased model complexity, typically dropping to zero if the complexity is increased to some extent. Although a model with small empirical error implies a perfect prediction on training

dataset, it may perform poorly on other test datasets which means the model is overfitted and does not fit for general cases.

AN US

The CV method sets aside a testing dataset to avoid being misled by overfitting models. In the K -fold CV, the training data  k  (Φ 1 ,

,Φ N )T is randomly split into K roughly

equal-sized groups and each group is denoted by  ki  (Φ {i } )T , :{1,

, N}

{1 ,

,K } .

M

When the i -th group  ki is defined as testing dataset, the model is trained on the other K  1

ED

groups data  ki  (Φ { \i } )T . The trained coefficient is denoted by a ki . Predictions are made on the testing dataset, i.e., hˆ ki   ki aki , and they are compared to the group targets hi . Loop

CE

PT

through every divided group, then the K-fold CV error is defined as follows:

ErrCV 

1 K i ˆ i 2  (h  h k ) . N i 1

(24)

AC

A typical choice of K is 5 or 10. With K increasing, the K -fold CV error will be of lower

bias but higher variance, furthermore considerable computational efforts are required. When

K  N , i.e.,  i  i , the CV error is termed as leave-one-out error and it is a reasonable estimate

of the expected error. For linear fitting problems, if the prediction is a linear transformation of target, i.e., hˆ

k

 Sh , an efficient approximation of leave-one-out error can be given, which is the so-called

ACCEPTED MANUSCRIPT

generalized cross validation (GCV) error [18]. As we have mentioned before, in hybrid LAR algorithm, the model prediction is: hˆ

k

  k ahybk   k (Tk  k )1 Tk h = Sh.

(25)

It can be proved that: 2

(26)

CR IP T

1 N i ˆ i 2 1 N  hi  hˆ k ,i    ,  (h  h k )  N  N i 1 i 1  1  Sii   

where Sii is the i -th diagonal element of S . Hence the GCV approximation is defined by: 2

(27)

AN US

ErrGCV

 1 N  hi  hˆ k ,i  ,   N i 1 1  trace(S) / N   

where trace(S) is the sum of Sii . The GCV doesn’t solve extra fitting problems thus it is highly efficient.

M

The errors discussed above are simply associated with the statistical resolution. However in practical applications, one may also consider the errors associated with physical resolution

ED

involved in numerically solving the governing equations of the flow problems. The interaction of

PT

these errors determines the allocation of computational resources [24, 29]. In this study, we only

CE

focus on the statistical errors associated with the PCE.

AC

3.4 Implementation with sequential samples In the LASSO-LAR, the performance of prediction is strongly dependent on the quality of the

training dataset. To prevent the generation of poorly distributed samples, appropriate sampling method for training dataset is indispensable. In the SCM and PCM, collocation points are generated by constructing tensor product with selected interpolation points in each dimension, or directly using Gaussian integral points for specific degree of polynomials [7, 26]. In the sparse

ACCEPTED MANUSCRIPT

PCE algorithm, excessive candidate basis and limited samples are given at first, which makes it impracticable to apply those methods whose sampling principle is closely correlated to the basis function. To address the problem, the Latin Hypercube sampling (LHS) method is applied in our problem, as it is able to generate well distributed samples with arbitrary size [8].

CR IP T

In LHS method, the range of each dimension is divided into several equiprobable intervals at first. Observations are randomly selected in each interval. Then by randomly combining

observations in different dimension, a LHS sample set whose size is equivalent to the number of

AN US

equiprobable intervals is obtained.

However the required number of sample is difficult to determine a priori, while a conservative design is usually generated at first, whose computational cost is relatively small. If

M

the current samples perform poor for prediction, some new samples should be added to the original design sequentially, until a good result is obtained, e.g., the CV error is sufficiently small.

ED

In other words, we hope the construction of response surface to be self-adaptive. On the basis of

AC

CE

this paper.

PT

LHS method, a sequential sampling method named inherited LHS method [35] is implemented in

(a) Original LHS design

(b) New design space

(c) Unsampled space

ACCEPTED MANUSCRIPT

Fig. 2. Inherited LHS in two-dimensional domain, three new samples (red dots) are generated on the basis of three original LHS samples (black dots).

To illustrate the inherited LHS method, let us look at the two-dimensional uniformly

CR IP T

distributed example in Fig. 2. Three samples are generated by the LHS method as original design (Fig. 2a). To generate three new samples without losing the uniformity of the LHS, the space is

redivided into six equiprobable intervals in each dimension. In the new design space, the intervals

AN US

occupied by the original LHS samples are shaded in Fig. 2b. The unshaded areas are extracted to

form the unsampled space as shown in Fig. 2c. If two or more samples fall into the same interval, the unsampled space has to be artificially expanded as indicated in [35]. Then the new samples are

M

generated to fill the unsampled space in a LHS way, which are denoted by the red dots in Fig. 2c. After that, samples in the unsampled space are mapped back to the new design space. Since the

ED

original design is still part of the new design, extra simulations are performed only on the new

PT

samples, therefore the method is inherited and highly efficient. Let us make a brief summary for the implementation of feature selection in basis reduction.

CE

At first, a conservative design is generated by the LHS method as the original sample set. Model

AC

simulation is performed for each sample. Then polynomial basis functions are derived by PCE to form the original feature set. The degree of polynomials may be set at a conservatively low level at first. The LASSO-LAR algorithm is adopted to select features and each feature subset is assessed by the approximated GCV criterion. If the stopping criterion cannot be reached, one may first increase the polynomial degree and make use of higher order basis functions for estimation. With the maximum degree one can accept, if the GCV error is still larger than expectation, the current

ACCEPTED MANUSCRIPT

sample set will be enriched with the inherited LHS method and the previous steps will be repeated. Once a specific feature subset is qualified, the response surface can be constructed. Finally, the post processing and validations for general cases are executed for further analysis. The flowchart

AC

CE

PT

ED

M

AN US

CR IP T

is shown in Fig. 3.

Fig. 3 Flowchart of basis reduction for sparse PCE with feature selection method.

4 Case studies In this section, one analytical example and two groundwater flow examples (steady state and

ACCEPTED MANUSCRIPT

transient) are presented to illustrate the sparse PCE method along with feature selection. In the analytical example, detailed validation of feature selection is mainly discussed. As for the flow examples, the sparse PCE is used for UQ of groundwater flow, and the results are compared with

CR IP T

MC simulations.

4.1 Illustrative analytical example 4.1.1 Problem statement

AN US

Let us consider a random variable h( )=M () governed by three uncertain variables

1 , 2 , 3 , which are assumed to be standard Gaussian variables and independent to each other. The model is analytical:

(28)

M

M ()  e1  223 .

In fact, the model is a black box to people who are analyzing its uncertainty characteristics. For

PT

than three, denoted by:

ED

illustration, the response surface is constructed by simple polynomial terms of degree no more

P

Mˆ ()   ai1k1 2k2 3k3 , k1  k2  k3  3, k1 , k2 , k3  ,

(29)

CE

i 1

where 20 polynomials are adopted as the original feature set. In this case, we utilize a constant

AC

LHS design with 40 samples to evaluate the PCE coefficients. After running the model 40 times as target evaluation, and assembling 20 candidate basis

functions as training dataset, the LASSO-LAR algorithm is applied to select useful features. Note that the training data are normalized and the targets are centered in the preprocessing step, there is no intercept between them, i.e., the constant term in the response surface is absorbed and needless to be considered. In addition, the L1-norm of coefficients is monotonically increased along the

ACCEPTED MANUSCRIPT

solution path due to the nature of LASSO-LAR, and finally an OLS solution aOLS is obtained when all features are activated. Hence we define a shrinkage factor s 

a LAR k a

1

OLS

, which is the ratio

1

of L1-norm between current LAR and full OLS coefficients, to quantify the progress of the k -th

CR IP T

step on solution path.

4.1.2 Application of feature selection

Theoretically, the analytical model can be rewritten by Taylor series expansion with Peano

AN US

remainder:

M ()  1  1 

12 2



13 6

 223  o(13 ).

(30)

Therefore among all features, we hope terms like 1 , 12 , 13 , 223 will be retained eventually and

M

equipped with similar coefficients as those in Eq. 30. The first ten steps of LASSO-LAR are

ED

demonstrated in Table 1.

PT

Table 1. First ten steps along LASSO-LAR solution path

AC

CE

Step

Added

Dropped

Size of

Shrinkage

active

factor

set

1

 223

1

0.104

2

1 22

2

0.165

3

13

3

0.272

4

1

4

0.353

5

12

5

0.672

ACCEPTED MANUSCRIPT 1 22

6

4

0.732

7

 23

5

0.736

8

2

6

0.743

5

0.748

6

0.750

 23

9

 23

CR IP T

10

In every step, a specific feature is added to or dropped from the active feature set. Right in the first six steps, features we hope to be retained are all selected, though an uncorrelated term

AN US

1 22 is included in the second step and finally discarded. That is to say the major stochastic

features are captured by the LASSO-LAR algorithm both accurately and immediately in this case. Moreover, the shrinkage factor increases slowly after six steps, indicating that the features added

M

afterwards are given very small weights and they may make limited contributions to the response

AC

CE

PT

ED

surface. Therefore we may guess the best feature subset is probably selected around the sixth step.

Fig. 4 Leave-one-out error and empirical error along the solution path.

ACCEPTED MANUSCRIPT

Assessments are made for each subset and both leave-one-out error and empirical error are shown in Fig. 4. The leave-one-out error appears to fluctuate along the solution path according to the performance of subset. There is a slight increase at step four with s  0.353 where 1 is

CR IP T

added, which is supposed to lower the leave-one-out error but failed. This is because the uncorrelated term 1 22 has interfered the impact of 1 and results in a slightly poor fitting. The minimum error is reached at step nine with s  0.748 , where ErrCV  0.0753 is marked by the

AN US

vertical line in Fig. 4. After that the prediction error starts to go up, suggesting that the features added afterwards may be redundant and useless for fitting. Therefore the best feature subset

consists of five activated terms at step nine, i.e., 1 , 12 , 13 , 223 and  2 , which quite agrees with our expectation. However, the empirical error appears to decrease all the way down. Even if a

M

bunch of redundant features are included after s  0.748 , it seems that their participation is

ED

continuously lowering the error and improves the prediction accuracy, but they are actually

PT

overfitted on the training data. If judged by empirical error, the original excessive feature set will

AC

CE

be identified as optimum, which is obviously against common sense.

(b)

AC

CE

PT

ED

M

AN US

(a)

CR IP T

ACCEPTED MANUSCRIPT

Fig. 5 (a) LAR coefficients (b) hybrid LAR coefficients of each feature along the solution path.

Let us check the coefficients of each feature in Fig. 5. Every single line in Fig. 5a represents the change of LAR coefficient for a specific feature along the solution path, and a vertical line is drawn at s  0.748 to illustrate where the best subset lies. Once a feature is activated, its

ACCEPTED MANUSCRIPT

coefficient will move from zero towards its OLS solution at s  1 , together with other activated features. But for 1 22 , its coefficient climbs up but declines to zero at last. In other words, it is moving oppositely after step three and gradually erasing its impact on fitting till step five. Thus a slightly increase of leave-one-out error is detected at step four, right in the transition period. After

Fig. 5a, however, they are presented with limited weights.

CR IP T

the best subset is selected, features are activated intensively as shown in the lower right corner in

Fig. 5b describes the hybrid LAR coefficients for each feature. Unlike ordinary LAR

AN US

coefficients moving gradually and jointly, the hybrid LAR coefficients intend to reach maximum

accuracy in every step. As a result, once a feature is activated, its hybrid coefficient will stretch as much as possible to get best fitting. Therefore there is a sudden rise for every line in Fig. 5b. When

M

several features are added and they are mutually constrained, the hybrid coefficients will gradually converge to the ordinary LAR coefficients. The optimal response surfaces constructed by ordinary

ED

and hybrid LAR coefficients are presented as follows: (31)

Mˆ hyb ()  1.0919  0.96131  0.544812  0.226313  0.9648223  0.05272 .

(32)

PT

Mˆ LAR ()  1.1038  0.94711  0.554512  0.225013  0.9667223  0.03292 ,

CE

These two estimations are both quite similar to the expansion form of model in Eq. 30. Though

AC

 2 is included, it is given a very small weight and its existence may interpret the effect of the omitted remainder. In general, the LASSO-LAR has done an excellent job for feature selection in this simple analytical case.

In the following, we will inspect its performance for the groundwater

flow problems.

4.2 Examples of steady state groundwater flow

ACCEPTED MANUSCRIPT

4.2.1 Problem statement In this section, we consider the steady state groundwater flow problem in a two-dimensional domain of saturated heterogeneous medium, which is a square of size Lx  Ly  10.0 [L] (where [L] denotes any consistent length unit), uniformly discretized into 40  40 square elements. The

CR IP T

non-flow conditions are prescribed at two lateral boundaries. The hydraulic head is prescribed at the left and right boundaries as 7.0 [L] and 5.0 [L] , respectively. No source term is considered. The log hydraulic conductivity field is considered as a Gaussian random field, whose mean is

AN US

Y  ln K s  0.0 and variance is  Y2  1.0 . Assume that the covariance function between

x( x1 , y1 ) and x( x2 , y2 ) is CY (x, x)   Y2 exp( x1  x2 / x  y1  y2 /  y ) , where

 x   y  4.0 is the correlation length of the random field. The log hydraulic conductivity field is

M

represented by the K-L expansion. Since the covariance function is separable, the eigenvalues and

direction, we have [44]:

2  2  2 ( xx ) Y2 ,  x ( j )  1

PT



( x) j

ED

eigenfunctions of CY can be derived analytically in each dimension. For example, in x

f

( x) j

( x) 

 x (j x ) cos( (j x ) x)  sin( (j x ) x) [( x (j x ) )2  1]Lx / 2   x

,

(33)

CE

where  (j x ) are positive roots of the characteristic equation: (x2 2  1)sin( Lx )  2x cos( Lx ).

(34)

AC

k( y ) and f k( y ) ( y) in the y direction are derived in the same way. The two-dimensional

eigenvalues (and eigenfunctions) are obtained by combining eigenvalues (and eigenfunctions) in each dimension:

i 

1



2 Y

 (j x ) k( y ) , fi (x)  fi ( x, y)  f j( x ) ( x) f k( y ) ( y).

(35)

CR IP T

ACCEPTED MANUSCRIPT

AN US

Fig. 6 Series of eigenvalues and their finite sums.

We rearrange the eigenvalues in a nonincreasing sequence and calculate the summation. Series of eigenvalues and their finite sums as a function of the number of terms are shown in Fig.

n

 / D Y2  80%, 90% or 95% , where 20, 60 or 164 terms are retained respectively.

i 1 i

ED



M

6. The infinite series in the K-L expansion is truncated at a certain level of energy, say

Theoretically a higher accuracy will be reached if more terms are preserved. Although the

PT

conductivity field with more energy is equipped with more details, these smaller scale fluctuations

CE

will make little difference to simulate the hydraulic head field. Thus we preserve 80% energy in K-L expansion for simplicity and the random field is 20-dimensional. According to the result in

AC

section 4.2.2, a random dimensionality of 20 is sufficiently enough to characterize the random structure of the hydraulic head field in this problem. Since there is no analytical reference, the direct sampling Monte Carlo (MC) method is applied for comparison and its solution is regarded as benchmark. For the MC, we preserve 164 terms in K-L expansion and run the simulator 10,000 times, which ensures statistical convergence and accurate results.

ACCEPTED MANUSCRIPT

The PCE response surface is constructed on the basis of the 20-dimensional random vector ( )  (1 ( ),..., 20 ( ))T . Hermite polynomials are adopted as basis functions since

1 ( ),..., 20 ( ) are standard Gaussian random variables. In fact, we have no idea which basis functions are useful and how many collocation points are sufficient beforehand. Therefore in case

CR IP T

of losing any important information, the original feature set consists of polynomials of degree 1

 20  3  and it will be gradually expanded to degree 3, where P     1,771 basis functions will be 3  included at most. In consideration of computational cost, a conservative design of 50 samples is

AN US

generated at first. Feature selection and assessment will be implemented later. The original sample set will be enriched with 20 new samples constantly, until a satisfying result is achieved. Besides, the setup for stopping criterion is quite tricky as the size of sample set and feature set always

M

change. To find a proper criterion, we may conduct several steps of selection at first, where the

ED

trend of error is observed for reference. In this case, the stopping criterion is set as the simulated

PT

head variance over the number of samples.

CE

4.2.2 Analysis of statistical properties The mean and variance of the hydraulic head field are obtained by post-processing. There is

AC

little difference between the estimated and the reference mean field owing to the particular boundary conditions in our examples, hence we mainly focus on the variance for comparison. Figs. 7a, 7b and 7c exhibit the head variance on three cross sections y  2.5 [L] , 5.0 [L] and 7.5 [L] , respectively. When the sample size is 50, the estimated variance is strongly nonsmooth

and far away from the benchmark. It means that the original conservative design is too limited to capture the characteristics of the random field. To improve the estimation, 20 samples will be

ACCEPTED MANUSCRIPT

added to current sample set, where actually 27 samples are generated because the unsampled space has to be artificially expanded to keep the new samples as LHS-type. The results with 77 samples are significantly improved as they are much more smooth and closer to the benchmark. If we add inherited LHS samples continually, estimations will gradually approach the benchmark. With just

(b)

AC

CE

PT

ED

(a)

M

AN US

CR IP T

137 samples, the estimation is fairly close to the MC reference (with 10,000 samples).

(c)

(d)

Fig. 7 Estimated head variance  h2 along the cross section (a) y  2.5 [L] , (b) y  5.0 [L] y  7.5 [L] for  Y2  1.0 , and (d) corresponding estimated errors compared with MC.

(c)

ACCEPTED MANUSCRIPT

The convergence rate for sparse PCE is much faster than that for MC method. In order to give a quantitative illustration, here we define an estimation error for the head variance on the cross section y :

1 Lx

 (

1  Nx

y

2 h

( x, y )  ˆ h2 ( x, y )) 2 dx (36)

Nx

 ( i 1

2 h

( xi , y)  ˆ ( xi , y)) , 2 h

2

CR IP T

y 

where N x is the total number of physical nodes on the cross section,  h2 and ˆ h2 are the reference and approximated variance respectively. This estimation error indicates the total

AN US

difference on the cross section and will gradually converge to zero with the sample size increasing. Comparisons for the error convergence between the sparse PCE and MC on the three cross sections are shown in Fig. 8d. Note that the MC results may vary a lot especially when a small

M

sample is utilized, thus error estimations are obtained by running 1,000 times bootstrap. When the

ED

sample size is small, the sparse PCE performs as poor as the same size of MC simulation. With the sample size increasing, results with sparse PCE method converge at a rapid speed. An accuracy of

PT

10-5~10-6 is reached with one or two thousand times of MC simulation. However, with the sparse

AC

CE

PCE method, one or two hundred times of simulation are fairly enough.

CR IP T

ACCEPTED MANUSCRIPT

AN US

Fig. 8 Hydraulic head PDF at (5.0 [L],5.0 [L]) , estimated by response surface constructed by different size of samples.

M

Once the hydraulic head is approximated in a polynomial form, the probability density

ED

function (PDF) can be obtained with MC sampling. By running 100,000 times MC simulation on the response surface, which is just calculation of polynomials and highly efficient, the PDF of the

PT

hydraulic head are obtained. The estimated PDF at a specific location, say (5.0 [L],5.0 [L]) , is

CE

shown in Fig. 8. With more than 77 samples, the estimations are quite consistent with the reference PDF, while the result with 50 samples is of a large bias. One may wonder why the

AC

statistic moments and PDF vary so much with different size of samples. Therefore let us check out how the response surface is constructed and how much weight is given to each basis function. The response surface of sparse PCE at location (5.0 [L],5.0 [L]) is studied here. The coefficients for all the basis functions are shown in Fig. 10, where blue bars represent coefficients for the polynomials of degree 1 and red bars for degree 2. Though polynomials up to degree 3 are provided as candidates, only the first two degrees are activated. More specifically, since

ACCEPTED MANUSCRIPT

polynomials of degree 1 are exactly the elements of the governing vector (1 ( ),..., 20 ( )) and the conductivity field is intrinsically a linear combination of them, they may have large impact on the response surface, especially for the first several elements which are dominated uncertainty factors. Thus most weights are given to polynomials of degree 1, as shown in Figs. 9b, 9c and 9d,

CR IP T

where 35, 37, and 41 features with non-zero coefficients are respectively presented. The selected basis functions and their coefficients are stable with more than 77 samples, indicating that the

algorithm is probably converged and the major stochastic features may be just these about 40 basis

AN US

functions. However, the coefficients trained by 50 samples behave strangely as shown in Fig. 10a. The basis matrix is 50 by 42, nearly square shaped and it is badly scaled with a condition number around 3,000. The coefficients trained in such condition are inconvincible and therefore the

M

statistic properties are highly biased. When 77 samples or more are utilized, the basis matrix is rectangular with a condition number no more than 100. Convincible regression results are usually

ED

obtained in such N  kP, k [2,3] ideal condition. As a result, a significant improvement is

PT

achieved with just 27 more samples, and more samples added afterwards will make limited

AC

CE

improvement because the algorithm converges fast.

(a)

(b)

CR IP T

ACCEPTED MANUSCRIPT

(c)

(d)

Fig. 9 The coefficients of basis functions for the constructed response surface at location

4.2.3 Effect of large spatial variability

AN US

(5.0 [L],5.0 [L]) , trained by (a) 50, (b) 77, (c) 109 and (d) 137 samples.

M

As illustrated in the previous section, the sparse PCE method is accurate and efficient for moderate spatial variability of the hydraulic conductivity field. In this section, we examine cases with larger

AC

CE

PT

ED

variability to inspect the validity of this method.

(a)

(b)

(c)

CR IP T

ACCEPTED MANUSCRIPT

Fig. 10 Estimated head variance  h2 along the cross sections y  5.0[L] with (a)  Y2  2.0 , (b)

AN US

 Y2  4.0 , and (c) corresponding estimated errors compared with MC.

We first increase the spatial variability to  Y2  2.0 (corresponding to the coefficient of

M

variation of hydraulic conductivity CvKs   Ks / Ks  253% ), while keeping other conditions

ED

unchanged. The MC benchmark and basis functions are generated in the same way as before. When constructing the sparse PCE response surface, the stopping criterion is set as simulated head

PT

variance over half the number of samples. The original sample size is still conservatively selected

CE

as 50, with 20 samples enriched every time. The estimated variances at y  5.0 [L] are shown in Fig. 10a. The estimated error as a function of sample size is denoted by the red line in Fig. 10c.

AC

We can see that even though the hydraulic conductivity field is of larger variability, the sparse PCE is still able to capture the random structure accurately with limited samples and maintain a rapid convergence speed, compared to the MC simulation. The sparse PCE is still highly efficient. However when horizontally comparing with the case of  Y2  1.0 , the estimated error is larger and more samples are needed to obtain satisfying results. Let us make the spatial variability even larger, say  Y2  4.0 (corresponding to the coefficient

ACCEPTED MANUSCRIPT

of variation of hydraulic conductivity CvKs  732% ). Since the random field is strongly fluctuated and complex, the construction of the response surface may start with 100 samples and 40 new samples are added every time. The stopping criterion is set much looser, which is the simulated head variance over 1/6 of the sample size. Under such circumstance, the estimated

CR IP T

variances and errors are shown in Figs. 10b and 10c, respectively. More than 200 samples are required to get good approximations and the estimated errors are even larger. Though the efficiency of the sparse PCE is effected by large variability, it remains robust compared to other

AN US

algorithms. Besides MC, in PCM and SCM, usually 3 or 4 degree polynomials have to be considered for such high spatial variability of the hydraulic conductivity field and the sample size may range from 103 to 104 [7, 26]. Since the sparse PCE attends to extract the most valuable

AC

CE

PT

ED

with such high variability.

M

information from limited samples and make the best use of them, decent results can be given even

(a)

(b)

Fig. 11 The coefficients of basis functions for the constructed response surface at location (5.0 [L],5.0 [L]) with (a)  Y2  2.0 and (b)  Y2  4.0 .

ACCEPTED MANUSCRIPT

The effect of variability can be also reflected in the coefficients of the response surface. Fig. 11 shows the coefficients for all the basis functions with spatial variability  Y2  2.0 and

 Y2  4.0 . By comparing Figs. 9 and 11, we can find that more features are activated with the variability increasing, however, the polynomials of degree 1 remain the dominated terms.

CR IP T

Specifically, for  Y2  4.0 , some polynomials of degree 3 are also included, denoted by the green bars in Fig. 11b. The sparse PCE intends to select polynomials of higher degree to characterize the details in a more complex field, while the major stochastic features are still retained. Such

AN US

behaviors of the response surface are within our expectation.

4.3 Example of transient groundwater flow

M

4.3.1 Problem statement

In this section, we apply the sparse PCE algorithm to the transient groundwater flow problem.

ED

We consider the same flow domain as in the steady state problem. A pumping well is located at

PT

(5.0 [L],5.0 [L]) with a constant pumping rate of 3.0 [L3 / T] ( [T] is any consistent time unit).

The specific storage is Ss  0.0001 [1/ L] . The prescribed head at the left and right boundaries are

CE

27.0 [L] and 25.0 [L] , respectively. The hydraulic head is steady before pumping. The

AC

simulation time is 5.0 [T] with 20 time steps. The log hydraulic conductivity field is equipped with mean Y  ln Ks  0.0 , variance  Y2  1.0 and correlation length of the covariance

x   y  4.0 [L] . MC simulations (with 100,000 samples) are performed to give an accurate

solution as the benchmark. The mean head and denary logarithm of head variance at t  0.5 [T], 1.25 [T], 2.5 [T] , obtained by MC simulations, are shown in Fig. 12.

ACCEPTED MANUSCRIPT

(b)

(c)

AN US

CR IP T

(a)

(d)

(e)

(f)

Fig. 12 Mean hydraulic head (upper row) and denary logarithm of head variance (lower row) at

M

(a)(d) t  0.5 [T] , (b)(e) t  1.25 [T] and (c)(f) t  2.5 [T] .

ED

4.3.2 Analysis of statistical properties

PT

Due to the presence of the pumping well, both mean and variance of the hydraulic head field become nonsmooth especially at the well, compared to the previous examples without pumping

CE

well. In the application of sparse PCE, we still start with 50 conservative samples, with 20 samples

AC

updated at each step, and the degree of polynomial is no more than 3. Along the cross section y  5.0 [L] , the estimated mean and variance of hydraulic head with sparse PCE trained by 264

samples are compared with MC (with 100,000 samples) as shown in Fig 13.

ACCEPTED MANUSCRIPT

(b)

(c)

AN US

CR IP T

(a)

(d)

(e)

(f)

Fig. 13 Estimated mean (upper row) and variance (lower row) of the hydraulic head on the cross

M

section y  5.0 [L] at (a)(d) t  0.5 [T] , (b)(e) t  1.25 [T] and (c)(f) t  2.5 [T] .

ED

It is obvious that the mean of hydraulic head from sparse PCE exactly matches the MC

PT

results at different time. Besides, the variance also agrees well with the MC solutions, while slight deviations are observed at the location of the well. Both the computational efficiency and accuracy

CE

of the sparse PCE method, which has been observed in steady state problems, are inherited to

AC

transient flow problems. The behaviors of PDFs for hydraulic head can be checked, as shown in Fig. 14.

ACCEPTED MANUSCRIPT

(b)

(c)

CR IP T

(a)

Fig. 14 PDFs of hydraulic head at location: (a) (4.5 [L],5.0 [L]) , (b) (5.0 [L],5.0 [L]) at the pumping well, and (c) (6.0 [L],5.0 [L]) at t  1.25 [T] , obtained from spares PCE and MC.

AN US

The PDFs at the pumping well and two locations nearby are shown in Fig. 14, which are

obtained by running 100,000 MC simulations on the response surface. The estimations agree well with the references, indicating that the random structure of the hydraulic head is captured by

M

sparse PCE. However the estimated PDF at the well is slightly biased. Since a drawdown funnel

ED

exists (Fig. 12) to ensure a constant pumping rate, the hydraulic head at the well is physically constrained to be lower than 26.0 [L] as shown in the reference PDF. The behavior of the well

PT

head is nonlinear in the probability space and it is difficult to be accurately characterized by

CE

polynomials. Therefore in the estimated PDF, we may find a few unphysical realizations with heads larger than 26.0 [L] . Even so, the estimations are slightly affected and still acceptable

AC

because nonlinear phenomenon is not that obvious in this single phase case. But for multiphase flow problems where nonlinearity is quite common, some modifications have to be made for the sparse PCE method to give an accurate description of nonlinearity. This problem will be studied in our further research.

ACCEPTED MANUSCRIPT

5 Conclusions In this study, we explore an efficient method to construct a sparse polynomial chaos expansion (PCE) response surface with the aid of feature selection algorithm. This method is utilized for uncertainty quantification (UQ) of flow in random porous media. The input random

CR IP T

field, i.e., the hydraulic conductivity field, is represented by the Karhunen-Loève (KL) expansion and the dependent random field, i.e., the hydraulic head field, is estimated by the PCE response surface. Feature selection algorithm is applied to select major stochastic features among

AN US

tremendous basis functions, so that the polynomial chaos coefficients can be calculated with limited realizations. The cross validation (CV) error is adopted to evaluate each feature set.

Besides, the response surface is improved by sequentially enriching the training samples with the

M

inherited Latin hypercube sampling (LHS) method. We first validate the algorithm with an analytical example to give an illustration of the feature selection process. The sparse PCE method

ED

is then applied to two-dimensional steady state flow problems with different spatial variability.

PT

The method is also implemented in a transient flow problem with a pumping well and the results are compared to Monte Carlo (MC) method. This study leads to the following conclusions:

CE

1. With relatively small computational efforts, the sparse PCE method is feasible for

AC

accurately quantifying uncertainty for flow in random porous media. The method is nonintrusive and highly parallelizable, which can be easily implemented with existing simulators. Feature selection method is utilized for basis reduction, and the required number of simulations is reduced. No matter how time-consuming the flow simulator is, the computational effort of feature selection is comparatively small, since the selection process doesn’t involve simulation and it is intrinsically a linear fitting problem. Thus the total computational cost is reduced and the method is highly

ACCEPTED MANUSCRIPT

efficient. 2. The convergence rate of the sparse PCE method is faster than the MC method when there are a reasonable number of effective features of the random field. The sparse PCE method intends to construct response surfaces without those irrelevant and redundant features. Therefore good

CR IP T

approximations can be obtained with a limited number of samples, even if the spatial variability is large and high dimension is considered. However, for those extremely complicated processes, where a large number of effective features exist, the advantage of sparse PCE may diminish.

AN US

3. The sparse PCE method is sequential and self-adaptive. The response surface is gradually improved by training more samples sequentially. Since new samples are generated hereditarily, the results from the original samples are still useful for new response surface construction. In addition,

M

samples generated by the inherited LHS method are of good statistical properties. The response surface trained by these well distributed samples is convincible.

ED

4. The feature selection and assessment modules in the sparse PCE method are replaceable.

PT

Though in this paper, the LASSO-LAR algorithm and the leave-one-out cross validation (CV) assessment are introduced, many outstanding algorithms and tools in statistical learning area can

CE

be applied for a better performance. The sparse PCE method is quite flexible and promising for

AC

further improvement.

5. It should be noted the sparse PCE method is not restricted to the flow type and

configuration considered in this study. The approach may be applied to complex flow configurations and multiphase flow problems, which is our ongoing study. However, for multiphase problems where high nonlinearity (or irregularity) exists, the fast convergence of sparse PCE may not hold true, since polynomials may be poor for describing highly nonlinear

ACCEPTED MANUSCRIPT

phenomena. This requires further research in the future.

References [1] Arnst M, R Ghanem, E Phipps, J Red‐Horse. Reduced chaos expansions with random

CR IP T

coefficientsin reduced‐dimensional stochastic modeling of coupled problems. International Journal for Numerical Methods in Engineering. 97 (2014) 352-76.

[2] Arnst M, R Ghanem, C Soize. Identification of Bayesian posteriors for coefficients of chaos

AN US

expansions. Journal of Computational Physics. 229 (2010) 3134-54.

[3] Babuška I, F Nobile, R Tempone. A stochastic collocation method for elliptic partial differential equations with random input data. SIAM Journal on Numerical Analysis. 45 (2007)

M

1005-34.

[4] Ballio F, A Guadagnini. Convergence assessment of numerical Monte Carlo simulations in

ED

groundwater hydrology. Water resources research. 40 (2004).

PT

[5] Blatman G, B Sudret. An adaptive algorithm to build up sparse polynomial chaos expansions for stochastic finite element analysis. Probabilistic Engineering Mechanics. 25 (2010) 183-97.

CE

[6] Blatman G, B Sudret. Adaptive sparse polynomial chaos expansion based on least angle

AC

regression. Journal of Computational Physics. 230 (2011) 2345-67. [7] Chang H, D Zhang. A comparative study of stochastic collocation methods for flow in spatially correlated random fields. Communications in Computational Physics. 6 (2009) 509.

[8] Choi S-K, RV Grandhi, RA Canfield, CL Pettit. Polynomial chaos expansion with latin hypercube sampling for estimating response variability. AIAA journal. 42 (2004) 1191-8. [9] Dai C, H Li, D Zhang. Efficient and accurate global sensitivity analysis for reservoir

ACCEPTED MANUSCRIPT

simulations by use of probabilistic collocation method. SPE Journal. 19 (2014) 621-35. [10] Dash M, H Liu. Feature selection for classification. Intelligent data analysis. 1 (1997) 131-56. [11] Doostan A, H Owhadi. A non-adapted sparse approximation of PDEs with stochastic inputs. Journal of Computational Physics. 230 (2011) 3015-34.

CR IP T

[12] Efron B, T Hastie, I Johnstone, R Tibshirani. Least angle regression. The Annals of statistics. 32 (2004) 407-99.

[13] Elsheikh AH, I Hoteit, MF Wheeler. Efficient Bayesian inference of subsurface flow models

AN US

using nested sampling and sparse polynomial chaos surrogates. Computer Methods in Applied Mechanics and Engineering. 269 (2014) 515-37.

[14] Friedman J, T Hastie, R Tibshirani. The elements of statistical learning. Springer series in

M

statistics Springer, Berlin, 2001.

[15] Ghanem R. Scales of fluctuation and the propagation of uncertainty in random porous media.

ED

Water Resources Research. 34 (1998) 2123-36.

2003.

PT

[16] Ghanem RG, PD Spanos. Stochastic finite elements: a spectral approach. Courier Corporation,

CE

[17] Ghiocel DM, RG Ghanem. Stochastic finite-element analysis of seismic soil-structure

AC

interaction. Journal of Engineering Mechanics. 128 (2002) 66-77. [18] Golub GH, M Heath, G Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics. 21 (1979) 215-23.

[19] Guyon I, A Elisseeff. An introduction to variable and feature selection. Journal of machine learning research. 3 (2003) 1157-82. [20] Hampton J, A Doostan. Compressive sampling of polynomial chaos expansions: Convergence

ACCEPTED MANUSCRIPT

analysis and sampling strategies. Journal of Computational Physics. 280 (2015) 363-86. [21] James G, D Witten, T Hastie, R Tibshirani. An introduction to statistical learning. Springer, 2013. [22] Le Maître O, OM Knio. Spectral methods for uncertainty quantification: with applications to

CR IP T

computational fluid dynamics. Springer Science & Business Media, 2010. [23] Le Maı̂ tre OP, MT Reagan, HN Najm, RG Ghanem, OM Knio. A stochastic projection

method for fluid flow: II. Random process. Journal of computational Physics. 181 (2002) 9-44.

AN US

[24] Leube PC, FP De Barros, W Nowak, R Rajagopal. Towards optimal allocation of computer

resources: Trade-offs between uncertainty quantification, discretization and model reduction. Environmental modelling & software. 50 (2013) 97-107.

M

[25] Li H, P Sarma, D Zhang. A comparative study of the probabilistic-collocation and

16 (2011) 429-39.

ED

experimental-design methods for petroleum-reservoir uncertainty quantification. SPE Journal.

PT

[26] Li H, D Zhang. Probabilistic collocation method for flow in porous media: Comparisons with other stochastic methods. Water Resources Research. 43 (2007).

CE

[27] Li H, D Zhang. Stochastic representation and dimension reduction for non-Gaussian random

AC

fields: review and reflection. Stochastic environmental research and risk assessment. 27 (2013) 1621-35.

[28] Liao Q, D Zhang. Constrained probabilistic collocation method for uncertainty quantification of geophysical models. Computational Geosciences. 19 (2015) 311-26. [29] Moslehi M, R Rajagopal, FP de Barros. Optimal allocation of computational resources in hydrogeological models under uncertainty. Advances in Water Resources. 83 (2015) 299-309.

ACCEPTED MANUSCRIPT

[30] Oladyshkin S, F de Barros, W Nowak. Global sensitivity analysis: a flexible and efficient framework with an example from stochastic hydrogeology. Advances in Water Resources. 37 (2012) 10-22. [31] Oladyshkin S, W Nowak. Data-driven uncertainty quantification using the arbitrary

CR IP T

polynomial chaos expansion. Reliability Engineering & System Safety. 106 (2012) 179-90. [32] Saeys Y, I Inza, P Larrañaga. A review of feature selection techniques in bioinformatics. bioinformatics. 23 (2007) 2507-17.

AN US

[33] Tatang MA, W Pan, RG Prinn, GJ McRae. An efficient method for parametric uncertainty

analysis of numerical geophysical models. Journal of Geophysical Research: Atmospheres. 102 (1997) 21925-32.

M

[34] Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological). (1996) 267-88.

ED

[35] Wang GG. Adaptive response surface method using inherited latin hypercube design points.

PT

Journal of Mechanical Design. 125 (2003) 210-20. [36] Webster MD, MA Tatang, GJ McRae. Application of the probabilistic collocation method for

CE

an uncertainty analysis of a simple ocean model. Technical report, MIT Joint Program on the

AC

Science and Policy of Global Change Reports Series No. 4, Massachusetts Institute of Technology. (1996).

[37] Wiener N. The homogeneous chaos. American Journal of Mathematics. 60 (1938) 897-936. [38] Xiu D, JS Hesthaven. High-order collocation methods for differential equations with random inputs. SIAM Journal on Scientific Computing. 27 (2005) 1118-39. [39] Xiu D, GE Karniadakis. Modeling uncertainty in steady state diffusion problems via

ACCEPTED MANUSCRIPT

generalized polynomial chaos. Computer methods in applied mechanics and engineering. 191 (2002) 4927-48. [40] Xiu D, GE Karniadakis. The Wiener--Askey polynomial chaos for stochastic differential equations. SIAM journal on scientific computing. 24 (2002) 619-44.

Journal for Uncertainty Quantification. 2 (2012).

CR IP T

[41] Yan L, L Guo, D Xiu. Stochastic collocation algorithms using L1-minimization. International

[42] Yang Y, JO Pedersen. A comparative study on feature selection in text categorization.

AN US

ICML1997. pp. 412-20.

[43] Zhang D. Stochastic methods for flow in porous media: coping with uncertainties. Academic press, 2001.

M

[44] Zhang D, Z Lu. An efficient, high-order perturbation approach for flow in random porous

AC

CE

PT

194 (2004) 773-94.

ED

media via Karhunen–Loeve and polynomial expansions. Journal of Computational Physics.

An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection

An efficient stochastic approach for flow in porous media via sparse polynomial chaos expansion constructed by feature selection

Recommend Documents