Surrogate accelerated sampling of reservoir models with complex structures using sparse polynomial chaos expansion

Surrogate accelerated sampling of reservoir models with complex structures using sparse polynomial chaos expansion

Accepted Manuscript Surrogate Accelerated Sampling of Reservoir Models With Complex Structures Using Sparse Polynomial Chaos Expansion Hamid Bazargan...

2MB Sizes 0 Downloads 51 Views

Accepted Manuscript

Surrogate Accelerated Sampling of Reservoir Models With Complex Structures Using Sparse Polynomial Chaos Expansion Hamid Bazargan, Mike Christie, Ahmed H. Elsheikh, Mohammad Ahmadi PII: DOI: Reference:

S0309-1708(15)00216-X 10.1016/j.advwatres.2015.09.009 ADWR 2461

To appear in:

Advances in Water Resources

Received date: Revised date: Accepted date:

31 October 2014 25 August 2015 13 September 2015

Please cite this article as: Hamid Bazargan, Mike Christie, Ahmed H. Elsheikh, Mohammad Ahmadi, Surrogate Accelerated Sampling of Reservoir Models With Complex Structures Using Sparse Polynomial Chaos Expansion, Advances in Water Resources (2015), doi: 10.1016/j.advwatres.2015.09.009

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • A novel reduced terms polynomial chaos proxy model is presented to reduce the computational cost. • We combined the PCE with Markov-Chain Monte Carlo for uncertainty quantification and history matching.

CR IP T

• An analytical approximation for the posterior distribution of the input parameters are given. • The relative importance of the PCE terms are compared using an upper bound on the coefficients.

AC

CE

PT

ED

M

AN US

• The application of the proposed PCE proxy is explored in history matching of a synthetic fluvial channel reservoir.

1

ACCEPTED MANUSCRIPT

Surrogate Accelerated Sampling of Reservoir Models With Complex Structures Using Sparse Polynomial Chaos Expansion Hamid Bazargana,∗, Mike Christiea , Ahmed H. Elsheikha , Mohammad Ahmadia a

CR IP T

Institute of Petroleum Engineering, Heriot–Watt University, Edinburgh, United Kingdom.

Abstract

AC

CE

PT

ED

M

AN US

Markov Chain Monte Carlo (MCMC) methods are often used to probe the posterior probability distribution in inverse problems. This allows for computation of estimates of uncertain system responses conditioned on given observational data by means of approximate integration. However, MCMC methods suffer from the computational complexities in the case of expensive models as in the case of subsurface flow models. Hence, it is of great interest to develop alterative efficient methods utilizing emulators, that are cheap to evaluate, in order to replace the full physics simulator. In the current work, we develop a technique based on sparse response surfaces to represent the flow response within a subsurface reservoir and thus enable efficient exploration of the posterior probability density function and the conditional expectations given the data. Polynomial Chaos Expansion (PCE) is a powerful tool to quantify uncertainty in dynamical systems when there is probabilistic uncertainty in the system parameters. In the context of subsurface flow model, it has been shown to be more accurate and efficient compared with traditional experimental design (ED). PCEs have a significant advantage over other response surfaces as the convergence to the true probability distribution when the order of the PCE is increased can be proved for the random variables with finite variances. However, the major drawback of PCE is related to the curse of dimensionality as the number of terms to be estimated grows drastically with the number of the input random variables. This renders the computational cost of classical PCE schemes unaffordable for reservoir simulation purposes when the deterministic finite element model is expensive to evaluate. To address this issue, we propose the reduced-terms polynomial chaos representation which uses an impact factor to only retain the most relevant terms of the PCE decomposition. Accordingly, the reducedterms polynomial chaos proxy can be used as the pseudo-simulator for efficient sampling of the probability density function of the uncertain variables. The reduced-terms PCE is evaluated on a two dimensional subsurface flow model with ∗

Corresponding author, Institute of Petroleum Engineering, Heriot–Watt University, Edinburgh, EH14 1AS, Scotland, UK Email address: [email protected] (Hamid Bazargan)

Preprint submitted to Elsevier

October 19, 2015

ACCEPTED MANUSCRIPT

fluvial channels to demonstrate that with a few hundred trial runs of the actual reservoir simulator, it is feasible to construct a polynomial chaos proxy which accurately approximates the posterior distribution of the high permeability zones, in an analytical form. We show that the proxy precision improves with increasing the order of PCE and corresponding increase of the number of initial runs used to estimate the PCE coefficient.

CR IP T

Keywords: Bayesian Parameter Estimation, Subsurface Flow Model, polynomial chaos, MCMC 1. Introduction

AC

CE

PT

ED

M

AN US

Traditional calibration methods of subsurface reservoirs (aka. history matching) usually obtain only a single set of parameters of the model, with uncertainty assessment provided by sensitivity calculations around the matched model. However, for example in oil field management, modern techniques have focused on predicting the likely range of field recoveries and consequently providing economic evaluations of different field development strategies. This approach takes into account observational errors in the observed history of the reservoir, and retrieves a set of model parameters whose simulation results lie within the vicinity of observed data, and uses them to estimate ranges in likely recovery factors. Generally, two main approaches for subsurface model calibration exist in the literature, one based on the optimization methods and the other based on the Bayesian inference. The optimization methods adjust the unknown parameter values through an automated process to obtain reservoir models within the allowed range of a misfit function. Various optimization techniques have been developed in the literature, including Genetic Algorithms [1], Particle Swarm Optimization [2], Neighborhood Algorithm [4], Estimation of Distribution [5], Levenberg-Marquardt [40] and LBFGS [8]. Existing optimization methods can be roughly classified into two general categories: stochastic algorithms and gradient-based methods. Gradient-based algorithms have several inherent limitations, including the need to compute the gradients at each step of the optimization process. A definite advantage of stochastic algorithms is that they are able to easily honor complex geological constraints by preserving multipoint statistics present in the prior geological model; the main drawback of these approaches is their inefficiency, as they require large number of simulations for convergence [10, 11]. However, most of these optimization-based algorithms do not provide any statistically valid estimate for the parameters uncertainty without additional calculations. For example, Genetic Algorithms [1] and Particle Swarm Optimization [2] do not correspond to a valid sampling mechanism. The reason for this is that the distribution of parameter values is mainly controlled by the algorithm settings [12]. This needs to be corrected by running a second code to compute probabilities associated with each set of parameters [4]. A review of recent research activities on subsurface flow model calibration can be found in [9, 66]. Approaches based on the Bayesian inference, on the other hand, aim at estimating the 3

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

posterior probability for the reservoir properties [13]. Existing Bayesian inference methods broadly entails algorithms based on particle filters such as the Ensemble Kalman Filter (EnKF) [14, 15], the sequential Monte Carlo methods [7] and the Markov Chain Monte Carlo (MCMC) approaches [16, 17]. MCMC methods are often used to probe the posterior probability distribution in the Bayesian inference inverse problems. Many MCMC methods move toward the target distribution in relatively small steps, with no tendency for the steps to proceed in the same direction [18]. Among these methods are the Gibbs sampling method, the Metropolis-Hasting algorithms and the slice sampling algorithm [18]. These methods are easy to implement and analyze, but unfortunately it can take a long time for the random walker to explore all of the space. The walker will often double back and cover ground already covered [18]. The difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error. A good chain will have a rapid mixing at which the stationary distribution is reached quickly starting from an arbitrary position. Variants of MCMC techniques have been developed in the literature to increase the convergence rate to the target distribution, but they are usually hard to implement [18]. Among these methods are Langevin MCMC [19], Hamiltonian Monte Carlo [20] and combinations of evolutionary algorithms with MCMC [21, 22]. Oliver et al. [16] utilized MCMC in the context of reservoir simulation where MCMC methods were used for conditioning a permeability field to pressure data. Efendiev et al. [73] proposed to use a two-stage approach MCMC for conditioning permeability fields. More recently, Emerick and Reynolds [23] proposed to use MCMC to improve the sampling obtained by the ENKF method. However, typical uses of MCMC methods need more than 105 steps to sample from the target distribution with a reasonable error [24]. For subsurface reservoir studies, the required large number of simulations need for convergence is practically infeasible. Hence, the main disadvantage of these approaches is their inefficiency. Therefore, it can be extremely time-consuming if high resolution models are used. This is particularly of concern in closed-loop reservoir management, which requires continuous real-time use of model calibration and uncertainty quantification algorithms [25, 26]. Thus, there is a significant need for an efficient proxy (or surrogate) model that can predict simulation results with a reasonable accuracy. The choice of the polynomial chaos expansions as a proxy model was pioneered by Ghanem and Spanos [27] and has been applied to various engineering problems [28, 29, 30, 31, 57, 33, 77]. The polynomial chaos proxy has an advantage over all other surrogate models that it systematically guarantees the convergence in probability and also in distribution to the output random variable of interest with finite variance, i.e. cumulative oil production. However, for high-dimensional problems, the number of the polynomial chaos terms increases drastically as the order of the polynomial chaos expansions increases and a large number of the full reservoir simulation runs may be required to compute high-order polynomial chaos expansions. Hence, for the efficient use of the polynomial chaos proxy, the size of the problem 4

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

has to be effectively reduced. Dimensionality reduction techniques, which have been applied in many application areas including reservoir simulation, represent promising means for constructing efficient surrogate models. Many of these techniques entail the projection of the high resolution description of reservoir into a low-dimensional subspace, which significantly reduces the number of unknowns. Karhunen-Loeve representation was first introduced by Chen et al. [34] within a reservoir engineering context to reduce the dimension of geological parameters. The basic approach was later used by, among others, Oliver [35], Reynolds et al. [36], and Sarma et al. [26] to approximate the high resolution geological model with a much lower dimensional space. Marzouk et al. [56] applied the polynomial chaos expansion along with the MCMC algorithm in the Bayesian framework for a low-dimensional problem and proved the efficiency of the algorithm. However, for the practical implementation of the polynomial chaos proxy in the Bayesian framework, the number of the terms is still required to be further reduced. More recently, efforts have been made to construct solution-adaptive uncertainty propagation techniques that exploit any structures in the solution to decrease the computational cost. Among them are the multi-scale model reduction of [57] and the sparse decomposition of [60, 59, 58] for the stochastic Galerkin technique, anisotropic and adaptive sparse grids of [51, 62, 61] for the stochastic collocation scheme, and low-rank solution approximations of [63, 64, 65, 78]. More recently in [55], the correlation between samples was used to justify the use of sparse promoting regularization (i.e. constraining the `1 norm) to generate sparse PCE representation of PDEs. Similarly, [68] studied a high-dimensional stochastic collocation method where the polynomial coefficients were obtained by solving a constrained minimization problem with `1 regularization term. Sparse promoting regularization can be iteratively solved using the orthogonal matching pursuit (OMP) [74, 75] as used in [55] or using the least angle regression (LARS) algorithm [70] as used in [71]. In this work, we propose a heuristic method for the sparse representation of polynomial chaos expansion and its application as a proxy substitute for the full reservoir simulator when applied with the MCMC method to efficiently sample from the posterior probability density function of reservoir random parameters. We use the Karhunen-Loeve expansion to decompose the geological parameters into a lower dimension. The polynomial chaos proxy is then trained with the reduced-order parameters of the reservoir model. The Bayesian inference provides a mathematical formulation for the posterior distribution of reservoir parameters. Instead of running the full reservoir simulation, we use the polynomial chaos proxy for the Bayesian inference. Then, we apply the MCMC method to sample from the posterior distribution. The organization of this paper is as follows; Section 2 presents the framework of the sparse polynomial chaos proxy. It includes the review of the Karhunen-Loeve expansion as a dimensionality reduction technique, and the derivation of a sparse formula for the poly5

ACCEPTED MANUSCRIPT

nomial chaos expansion, followed by a review of the Bayesian inverse framework and the MCMC method to sample from the analytical approximation of the posterior distribution. In Section 3, the proposed algorithm is applied for calibration of an analytical one dimensional elliptic stochastic PDE and a history matching problem of a two dimensional example of fluvial channels. Finally, the conclusions of our work are drawn in Section 4.

CR IP T

2. Sparse Polynomial Chaos Proxy Framework For Uncertainty Quantification 2.1. Dimensionality reduction

AN US

The Karhunen-Loeve expansion is a powerful tool for representing stationary and nonstationary random fields (i.e. permeability) which preserves the two-point statistics of the random field. It allows representing any random field or process with finite variances as a series expansion involving a complete set of deterministic functions with corresponding random coefficients, which can be used as a differentiable parameterization of the random field. The Karhunen-Loeve expansion is basically a linear map which decorrelates the random field. The mathematical formulation of the p−term truncated Karhunen-Loeve expansion for a second order random field K(x) is as follows: K(x) = K(x) +

p X

αi (x)ξi

(1)

M

i=1

PT

ED

where overlineK(x denotes the mean of the random field K(x), ξi ’s are uncorrelated random variables which have zero mean and unit standard deviation and αi (x)’s are deterministic functions which are calculated by computing the eigenvalues and eigenfunctions of the covariance function of the random field; p (2) αi (x) = λi fi (x)

AC

CE

where λi ’s and fi (x)’s are the eigenvalues and eigenfunctions of the covariance function. The random variables ξi are independent and Gaussian only if the random field is Gaussian. In any other case, the variables ξi will be non-Gaussian. When the explicit expression for the correlation function is not known, we use the ensemble covariance matrix to perform the Karhunen-Loeve transformation. Accordingly, the deterministic coefficients of the Karhunen-Loeve expansion are calculated by computing the eigenvalues and eigenvectors of the ensemble covariance matrix. To reduce the dimension using the Karhunen-Loeve expansion, the smallest eigenvalues are discarded. The rate of the decay in eigenvalues depends on the correlation strength of the random field. The more correlated a random field is, the faster decay in its eigenvalues happen. Discarding the smallest eigenvalues implies that we are discarding the shortest correlation lengths (high frequency changes). It can be shown that among all parameterization, the Karhunen-Loeve expansion minimizes the least-square

6

ACCEPTED MANUSCRIPT

CR IP T

approximation error of representing the realizations used to create the covariance matrix C [26]. In the current work, K(x) denotes the log-permeability field under study, and x denotes a vector of spatial coordinates. If permeability data are available on a discrete grid, it is convenient to apply the discrete KL expansion, especially if the same grid is used for the numerical simulation. However, when the permeability is learned in a Bayesian updating context, typically data are not available a priori. Moreover, evaluation of a positive definite sample covariance matrix for a set of grid points would require a very large number of data. Therefore, in such cases an assumption on the form of the prior correlation function is usually made and the continuous KL expansion can be readily applied. Alternatively, the discrete KL expansion can also be applied with the covariance matrix derived from the assumed correlation structure.

AN US

2.2. Sparse Polynomial Chaos Expansion

ED

M

The polynomial chaos expansion was first introduced by Wiener in the name of “the homogeneous chaos theory”. In his work, he proposed a spectral expansion for representing well-behaved random variables (i.e. second order). The use of homogeneous chaos theory for uncertainty quantification was pioneered by Ghanem and Spanos [27], under the name of polynomial chaos expansions (PCE), and has been successfully applied to various engineering problems. To illustrate the applicability of PCEs for uncertainty quantification, consider a random variable y with finite variance, which is a function of some random variables ξ, such as: y = G(ξ). (3)

CE

PT

Here, the random variables ξ are the uncorrelated random variables in the Karhunen-Loeve decomposition of the permeability field under study. In our problem, G(ξ) represents the simulation model. Due to the fact that it is very expensive to directly evaluate G(ξ), we are interested in creating approximations to this model that can be evaluated efficiently. Wiener [37] represented a general second-order random variable in the following form:

AC

y = a0 H0 +

p X

ai1 H1 (ξi1 (θ)) +

i1 =1 p

+

X

p i1 X X

ai1 ,i2 H2 (ξi1 (θ), ξi2 (θ))

i1 =1 i2 =1 i1 X i2 X

(4)

ai1 ,i2 ,i3 H3 (ξi1 (θ), ξi2 (θ), ξi3 (θ)) + . . . .

i1 =1 i2 =1 i3 =1

where Hn (ξi1 , ξi2 , . . . , ξip ) denotes the orthogonal Hermite polynomials of order n and (ξi1 , ξi2 , . . . , ξip ) are multi-dimensional independent Gaussian random variables with zero mean and unit variance. θ represents the sample point of the underlying probability space. The random variables ξ are functions of θ because random variables are set functions that map from a sample space to the real line. Equation (4) is the discrete version of the original Wiener 7

ACCEPTED MANUSCRIPT

polynomial chaos expansion introduced in his 1938 paper [37], where the continuous integrals are replaced by summations. The Hermite polynomials of order n, Hn (ξi1 , ξi2 , . . . , ξip ) can be derived by the following formulation [37]: 1 T ξ)

Hn (ξi1 , ξi2 , . . . , ξip ) = e( 2 ξ

(−1)n

1 T ∂n e−( 2 ξ ξ) . ∂ξi1 ∂ξi2 . . . ∂ξip

(5)

CR IP T

Here, ξ denotes the vector of p Gaussian random variables. Under the Gaussian probability measure (w(ξ) : N (0, 1)), Hermite polynomials are orthogonal to each other and form a complete basis of Hilbert space. The orthogonality of Hermite polynomials under the Gaussian measure implies: Z ∞ ξ2 1 hHi (ξ), Hi (ξ)i = Hi (ξ)Hi (ξ) √ e− 2σ2 dξ = i!δij ∀i ≥ 1. (6) 2π −∞

R∞

ξ2

Hn (ξ)G(ξ) √12π e− 2σ2 dξ hHn , yi −∞ = R = ξ2 ∞ hHn , Hn i 2 (ξ) √1 e− 2σ2 dξ H n −∞ 2π PNs Hn (ξi )y(ξi ) E[Hn (ξ)y(ξ)] = ' i=1 . n!] n!

M

ai1 ,i2 ,...,in

AN US

where δij is the Kronecker delta. Since the polynomial bases are all orthogonal to each other, the deterministic coefficients of Equation (4) can be computed using a L2 projection scheme:

(7)

AC

CE

PT

ED

However, for computing the coefficients correctly, the convergence of the right hand side of the equation has to be studied. For higher order Hermite polynomials, large number of samples may be required to be trained to reach the convergence. For a given number of simulation runs, the coefficients associated with the higher order of PCE terms tend to have larger estimation errors compared with the lower order terms, and therefore, there is a balance between the number of training runs and the maximum possible order of PCE that can be reliably used. To resolve this issue, various methods have been proposed in the literature. Probabilistic Collocation Method (PCM) was proposed by Tatang et al. [38] to give a reasonable estimate of the coefficients. In the traditional PCM, the number of simulations required is equal to the number of coefficients in the PCE expansion, which grows exponentially with the number of random variables, and order of PCE. Isukapalli [39] used a modified form of PCM, which employs a least-square technique to determine the PCE coefficients, instead of a linear system. As a regression problem, the number of training runs need not be equal to the number of terms in the PCE. However, if the number of simulations is too few compared with the number of terms, the regression problem is severely undetermined and non-unique. To alleviate the curse of dimensionality problem, Li et al. [40] used only pure terms from the PCE and discarded the cross terms. This reduces the number of coefficients drastically, and they showed it produces a reasonable estimate for slightly nonlinear problems. 8

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

Moreover, under the assumption that the factor space under study is far from stationary point, the ”sparsity of effects” principle [41] implies that most models are mainly governed by the main-effects and only the lower-order interaction between the input random variables. Hence, for the practical applications that the polynomial chaos expansions are considerably sparse [42], many terms could be neglected from the expansions. Accordingly, the number of the simulation runs required to achieve high-order polynomial chaos approximation of a forward model does not increase exceedingly if only the relevant terms are preserved in the reduced-terms polynomial chaos expansions. Unfortunately, the sparsity of effects assumption does not hold for a range of practical applications in the literature [42]. Therefore, several methods have been recently proposed to detect the sparsity pattern and achieve an accurate sparse polynomial chaos expansions, or adaptively construct the polynomial chaos basis to obtain a highly sparse polynomial chaos representation which has only few non-zero terms [55, 68]. In our proposed polynomial chaos proxy which integrates polynomial chaos expansion with the Karhunen-Loeve decomposition method, we demonstrate that for the fast decaying eigenvalues of the Karhunen-Loeve expansion, the polynomial chaos expansion will be sparse. We also determine the sparsity pattern of the polynomial chaos expansion using the relative importance of Karhunen-Loeve modes. In the sense that among the cross-terms associated with the high-frequency Karhunen-Loeve modes (the modes related to the small eigenvalues) only few terms are relevant and the rest become adequately negligible and can be discarded. Hence, for high-dimensional problems, with a reasonable number of simulation runs, the high-order reduced-terms polynomial chaos proxy can be efficiently constructed. The approximation we propose here is the reduced-terms polynomial chaos proxy, which uses the impact factor to discard the interaction terms that are not relevant. We use the fact that the random variables ξi in the polynomial chaos expansion when applied along with the Karhunen-Loeve expansion, are not of the same priority to uncertainty quantification problem √ as they are weighted by λi in the reduced-dimension representation of the random field of Equation (2). The idea of the reduced-terms polynomial chaos representation is basically the generalization of ”scree graph” in high order terms [53]. In the sense that similar to the scree plot where we ignore the random variables (terms) weighted by the small eigenvalues, in higher order we discard the terms associated with powers of the small eigenvalues. As a result, the curse of dimensionality is mitigated to a great extent in the reduced-terms polynomial chaos expansion when applied along with the Karhunen-Loeve expansion. To illustrate the idea, let g(K(ξ)) be a functional of a random variable K(ξ), where √ ¯ + λ1 ξ and ξ is a Gaussian random variable. The n−term truncated polynomial K(ξ) = K chaos expansion can be written as: g(K(ξ)) = g¯ +

n X

ai Hi (ξ)

(8)

i=1

When PCM method is used with NT number of trial runs {ξ 1 , . . . , ξ NT }, the polynomial chaos 9

ACCEPTED MANUSCRIPT

coefficients can be computed by solving a linear system of equations. By expanding the Hermite polynomials, Equation (8) can be equivalently written as the polynomial representation of n X g(K(ξ)) = g¯p + bi ξ i . (9) i=1

1 dm g bm = . m! dξ m ξ=0

CR IP T

In our work, we use the latter representation to address the sparsity pattern by computing an upper bound on bi . Taking the m−th derivative of both sides of Equation (9), we have:

AN US

Using the chain rule and the Fa`adi Bruno’s formula [52], with respect to the fact that for m > 1, we can write bm as: 1 dm g  dK m 1 dm g p m bm = = ¯ m ( λ1 ) , m! dK m dξ m! dK ξ=0

∂mK ∂ξ m

(10) =0

(11)

PT

ED

M

m ¯ Denoting the magnitude of where ddK¯ mg is the m−th derivative of g with the respect to K. the m−th derivative with Tm , the size of the coefficient of the one-dimensional polynomial expansion can be written as: p 1 |bm | = Tm ( λ1 )m . (12) m! The multi-dimensional case is more involved. The first order polynomial coefficients of g(K), √ ¯ + Pp where K = K λi fi ξi and fi are the eigenvector associated with the eigenvalue λi , i=1 can be obtained as  ∂g(K) T ∂K ∂g(K) √ ∂g(K(ξ)) (13) b1i = = = ¯ ¯ fi ( λi ), ∂ξi ∂ξi ξi =0 ∂K ∂K

AC

CE

where |fi | = 1. Let Tm denote the magnitude of the m−th order total variation of g with ¯ as: respect to K s  2 X ∂ m g(K) ∂ mg Tm = | | = . (14) i1 i2 ¯m in ∂K ∂k ∂k . . . ∂k 1 2 n i1 +i2 +...+in =m

Then we can find an upper bound on the first order coefficients b1i by using the CauchySchwarz inequality: p p ∂g(K) |b1i | ≤ | | × |f | × ( λ ) = ( λi )L1 . (15) i i ¯ ∂K Accordingly, for the second order of the polynomial coefficients b2i,i we have: b2i,i

1 ∂ 2 g(K(ξ)) 1 ∂  ∂g  1 ∂  ∂g T ∂K  = = = ¯ ∂ξi . 2! ∂ξi2 2! ∂ξi ∂ξi ξ=0 2! ∂ξi ∂ K ξ=0 10

(16)

ACCEPTED MANUSCRIPT

Using the chain rule and the fact that

where



∂K ∂ξi



= 0, we have:

 ∂ 2g  1  ∂K T ∂ 2 g 1 p 2 1 ∂K T  ∂ 2 g  ∂K  T = = ( ¯ 2 ( ∂ξi = 2! ( λi ) (fi ) ∂ K ¯ 2 (fi ) 2! ∂ξi ∂K∂ξi ξ=0 2! ∂ξi ∂K ∂2g ¯2 ∂K

(17)

is the Hessian or the second order tensor derivative. Hence, |b2i,i | ≤

similarly, we have b2i,j = and

1 p 2 2 ∂ 2 g 1 p ( λi ) |fi | ¯ 2 = ( λi )2 L2 . 2! 2! ∂K

CR IP T

b2i,i

∂ ∂ξi

 ∂ 2g  p 1 p ( λi )( λj )(fi )T ¯ 2 (fj ), 1!1! ∂K

(18)

(19)

p 1 p ( λi )( λj )L2 . (20) 1!1! ng For the higher orders the same procedure can be applied using the m−th order tensor ∂∂K¯ m and we have p √ √ ( λ1 )i1 ( λ2 )i2 . . . ( λp )ip m Tm . (21) |bi1 ,i2 ,...,ın | ≤ (i1 !)(i2 !) . . . (ip !)

AN US

|b2i,j | ≤

CE

PT

ED

M

The sparse representation of the polynomial chaos is feasible when we have fast decaying eigenvalues. To achieve the reduced-order representation of a sparse polynomial chaos expansion, we use the impact factor based on the upper bound of the polynomial coefficients. For the m-th order polynomial chaos, we retain all the pure terms associated with the largest √ eigenvalues ( λ1 ). To retain only the relevant terms in the polynomial of order k, we use the following impact factor p √ √ ( λ1 )j1 ( λ2 )j2 . . . ( λp )jp (j1 !)(j2 !) . . . (jp !) 6 Tm j1 j2 jp √ Ik (ξ1 ξ2 . . . ξp ) = k! 6 Tm ( λ1 )k p j (22) √ j √ j ( λ1 ) 1 ( λ2 ) 2 . . . ( λp ) p (j1 !)(j2 !) . . . (jp !) √ = k! ( λ1 )k

AC

where λ1 > λ2 > . . . > λp . The importance factor compares the relative importance of terms at order k and not the importance of terms at different order, as they are normalized by different denominators. In the reduced-terms polynomial chaos representation, the first monomials are always retained and for the higher orders, if the impact factor of the term j j ξ1j1 ξ2j2 . . . ξnp is smaller than the predetermined cut-off value , Ik (ξ1j1 ξ2j2 . . . ξpp ) < , the term is discarded, otherwise it will be preserved in the polynomial chaos expansion. For example, if λ1 = 4λ2  λ3 > . . . > λp , the impact factor of the 5th order polynomial ξ25 is: √ ( λ2 )5 1 5 I5 (ξ2 ) = √ 5 = . (23) 32 ( λ1 ) Hence, it makes sense to drop ξ25 in favor of ξ15 when  = 0.1. 11

ACCEPTED MANUSCRIPT

Accordingly, the idea of the reduced-terms polynomial chaos approximation is basically a heuristic approach to reduce the number of terms in the polynomial chaos based on the relative importance of the Karhunen-Loeve terms, when the PCM (or the regression-based PCM) is used to compute the coefficients. 2.3. Bayesian Inference

AN US

CR IP T

We are interested in the inverse problem of characterizing the posterior distribution of the unknown parameters ξ resulted from the Karhunen-Loeve expansion of the log-permeability field K(x), given finite-dimensional observational data denoted by d. The aim is to merge uncertainties, both in prior knowledge and observational noise, with the mathematical model that describes the reservoir flow. We approach this inverse problem by means of the Bayesian framework presented in [54]. The prior uncertainty in geologic properties is incorporated in terms of a prior probability distribution ρ(ξ). The data d and the geologic properties ξ are related via the forward operator G by d = G(ξ) + η (24)

ED

M

where η is a vector of random noise whose covariance Γ is known to us. For simplicity, we assume that η ∼ N (0, Γ). The likelihood of the measured data d given a particular instance of ξ is denoted by P(d|ξ). In the Bayesian framework, the uncertainty of the inverse estimates of ξ given d is quantified with the conditional posterior probability σξ (ξ), which from Bayes rule, is given by σξ (ξ) ∝ ρ(ξ)P(d|ξ) (25)

CE

PT

In the current work, we assume the model uncertainties to be negligible compared to observational/data uncertainties, and we also assume that the observations are independently identically distributed. Hence, the posterior probability of parameters ξ can be expressed as:   σξ (ξ) = k ρξ (ξ) exp − S(ξ, d)

(26)

AC

where k is the proportionality constant and S(ξ, d) is the negative log likelihood expressed as: 1 (27) S(ξ, d) = ||d − G(ξ)||2Γ , 2 where Γ is the covariance of the observational noise. It is also called the misfit surface. Since G is nonlinear, the posterior is non-Gaussian even when the prior ρ(ξ) is Gaussian. Accordingly the posterior distribution of the geological parameters can be characterized by means of sampling. Unfortunately, this approach often requires millions of forward model evaluations which is only feasible for small problems where a relatively coarse grid is used to discretize the geologic properties.

12

ACCEPTED MANUSCRIPT

Here in this work, we try to address this problem by running a surrogate model rather than the expensive forward model evaluation. Accordingly, we obtain an analytical approximation for the posterior distribution of the reduced-dimension parameters ξ. By replacing the polynomial chaos proxy model for the full reservoir simulation model G in Equation (27), we obtain the following expression for the misfit surface: Si,1 ξi +

p X

Sii,2 (ξi2

i=1

i=1

− 1) +

p−1 p X X

Sij,2 ξi ξj + . . .

(28)

i=1 j>i

CR IP T

S(ξ) = S0 +

p X

Hence, we can achieve an analytical approximation for the posterior distribution of the reduced-dimension parameters as σξ (ξ) = k ρξ (ξ) exp{−(S0 +

p X

Si,1 ξi +

p X i=1

i=1

Sii,2 (ξi2 − 1) +

p−1 p X X

Sij,2 ξi ξj + . . .)},

(29)

i=1 j>i

M

2.4. Metropolis-Hasting MCMC Method

AN US

We can readily use the analytical formulation of Equation (29) for post processing (posterior mean, covariance, P10 , P90 , etc). Since the general expression for the posterior distribution is nonlinear, we need to apply the MCMC methods to sample from the posterior distribution. In the next section we briefly introduce the fundamental steps in the Metropolis-Hasting MCMC sampling method.

AC

CE

PT

ED

The Metropolis (or Metropolis-Hastings) algorithm was developed by Metropolis and Ulam [43], Metropolis et al. (1953) [44], and Hastings (1970) [45]. It belongs to the family of MCMC methods, i.e., it is random (Monte Carlo) and also has no memory (Markov chain), in the sense that each step depends only on the previous step. The basic idea of the algorithm is to sample states from a Markov chain with stationary distribution equal to the target distribution. For this purpose, an irreducible and aperiodic Markov chain is conducted in a way that at equilibrium state, it converges to the target distribution σ(ξ). The only requirement that the Metropolis-Hasting algorithm imposes is that a function proportional to σ(ξ) has to be calculable. In the Bayesian application, the normalization constant is often very difficult to compute. Therefore, the ability to draw samples from a not normalized distribution function is an important feature of the Metropolis-Hasting algorithm. Assume we want to sample from σ(ξ) where σ(ξ) = r(ξ) , where K is the normalizing K constant and it is either not known or extremely difficult to compute. The Metropolis algorithm [43, 44] draws sample from σ(ξ) with the following procedure: 1. Start with any initial value ξ 0 , where r(ξ 0 ) > 0. 2. Using a proper proposal distribution q(ξ 1 , ξ 2 ) to generate a candidate sample ξ ∗ . The proposal distribution is a conditional distribution on the current state. The only restriction on the proposal distribution in the Metropolis algorithm is that it has to be symmetric, i.e., q(ξ 1 , ξ 2 ) = q(ξ 2 , ξ 1 ). 13

ACCEPTED MANUSCRIPT

3. Calculate the ratio of the probability of the new proposed sample to the probability of the current state: σ(ξ ∗ ) r(ξ ∗ ) α= = (30) σ(ξ t−1 ) r(ξ t−1 )

CR IP T

Note that the normalization constant K cancels out in this fashion. If the new proposed sample is more probable than the current sample (α > 1), accept the move, otherwise with the probability α accept the proposed sample. If the proposed sample is rejected, the chain remains in the current state, i.e. the new sample is equal to the previous sample 4. Repeat the procedure until the equilibrium state is achieved. The above procedure can be summarized as first computing

(31)

AN US

 r(ξ ∗ )  ,1 α = min r(ξ t−1 )

M

and then accepting a proposed move with probability α. It can be shown that after a sufficient ”burn-in” steps, this Markov chain approaches its stationary distribution that is equal to σ(ξ). Hastings (1970) generalized the Metropolis algorithm by assuming that the proposal distribution can be an arbitrary probability density function q(ξ 1 , ξ 2 ), and setting the acceptance probability of a proposed sample as:

ED

 r(ξ ∗ )q(ξ ∗ , ξ t−1 )  α = min ,1 r(ξ t−1 )q(ξ t−1 , ξ ∗ )

(32)

AC

CE

PT

This is called the Metropolis-Hastings algorithm. If we assume that the proposal distribution is symmetric, i.e., q(ξ 1 , ξ 2 ) = q(ξ 2 , ξ 1 ), the original Metropolis algorithm is then recovered. It can be shown that the Metropolis-Hasting Markov chain converges to σ(ξ) at equilibrium [43, 44]. A key issue in the performance of the Metropolis-Hasting algorithm is the number of steps required for the Markov chain to converge to its stationary distribution (the burn-in period). Typically the first 1000 to 5000 samples are ignored and the convergence tests are used to examine whether the chain has converged [46]. A poor choice of starting value or proposal distribution can considerably increase the burn-in time. The optimal choice for the starting point and the proposal distribution has been an area of current research [46]. Since running the polynomial chaos proxy is cheap, the choice of proposal distribution does not affect the performance of our proposed method and we can use the popular choice of multi-dimensional Gaussian distribution as the proposal distribution. The scale parameter in the multivariate normal distribution effectively controls the acceptance probability and is chosen based on the number of random variables to account for the low acceptance rate in high dimensions.

14

ACCEPTED MANUSCRIPT

3. Numerical Experiments In this section, we explore the efficiency of the proposed sparse polynomial chaos proxy through its application to • a one dimensional (in space) elliptic stochastic PDE with high-dimensional random diffusion coefficients, similar to the experiment studied by Doostan et. al [55],

CR IP T

• a two dimensional example of fluvial channels, similar to the case studied by Brouwer et. al [72], to demonstrate that the sparse-polynomial chaos proxy as applied with the MCMC method can efficiently sample from the posterior probability density function of the system parameters. 3.1. I. One Dimensional Elliptic Stochastic Differential Equation

AN US

We consider the solution of a one-dimensional stochastic differential equation, d du(x, θ)  a(x, θ) = 1, dx dx u(0, θ) = u(1, θ) = 0, −

x ∈ D = (0, 1),

(33)

where the stochastic diffusion coefficients is given by the expansion

M

d p X

a(x, θ) = a ¯ + σa

λi φi (x)ξi (θ).

(34)

i=1

ED

Here, λi and φi (x) are the d largest eigenvalues and the corresponding eigenfunctions of the Gaussian covariance kernel 2 −

(x1 −x2 ) l2 c

,

(35)

PT

C(x1 , x2 ) = e

AC

CE

where lc is the correlation length of a(x, θ) and random variables ξi (θ) are assumed to be independent Gaussian random variables with zero mean and unit variance. We solve the eigenvalue problem numerically with the method of kernel eigenvalue problem presented by Sarma et al. [26]. The coefficient σa controls the variability of a(x, θ). We compare the accuracy of the proposed sparse approximation scheme with the Orthogonal Matching Pursuit (OMP) method implemented by Doostan et al. [55] to approximate the coefficient of the PCE. The OMP algorithm [74, 75] is a greedy method for solving the sparse approximation problem. It has been applied to sparse signal recovery in many studies [75]. This method is very straight forward as the approximation is generated by going through an iteration process. During each iteration the column vectors which most closely resemble the required vectors are chosen. These vectors are then used to build the solution. The algorithm tries to solve the problem min ||a0 || subject to: ||ˆ y − Ha||2 ≤ η, a

15

(36)

ACCEPTED MANUSCRIPT

Relative error in the mean

−2

10

−3

10

−4

MC OMP Sparse PCE using impact factor, ε = 0.1 Sparse PCE using impact factor, ε = 0.2

−5

10

−6

10

2.1

10

2.2

10

2.3

10

2.4

2.5

10

10

2.6

10

CR IP T

10

2.7

10

Figure 1: Comparison of relative error in mean of the solution at x = 0.5 for the Monte Carlo simulation, the OMP sparse scheme and the reduced-terms polynomial chaos approximation for  = 0.1 and  = 0.2. −1

AN US

10

−2

−3

10

OMP Sparse PCE using impact factor, ε = 0.1 Sparse PCE using impact factor, ε = 0.2 MC

−4

10

2.1

10

2.2

10

M

Relative error in s.d.

10

2.3

10

2.4

2.5

10

10

2.6

10

2.7

10

N

ED

Figure 2: Comparison of relative error in the standard deviation of the solution at x = 0.5 for the Monte Carlo simulation, the OMP sparse scheme and the reduced-terms polynomial chaos approximation for  = 0.1 and  = 0.2.

AC

CE

PT

where ||a0 || is the `0 norm that counts the number of non-zero components of a vector a. The reconstructed field y ˆ is approximated iteratively by a linear combination of the PCE basis functions H. This is done by finding the column vector in H which most closely resembles a residual vector. The residual vector starts out being equal to the vector that is required to be approximated (vector of trial runs) and is adjusted at each iteration to take into account the vector previously chosen. The iterations are continued until the error truncation tolerance η is achieved. It is the hope that this sequence of locally optimum solutions will lead to the global optimum solution. As usual this is not the case in general although there are conditions under which the result will be the optimum solution. The accuracy of the OMP method strongly depends on the choice of truncation tolerance η and the sample size N . In fact, the stability depends on the choice of the truncation error as different threshold values could lead to different results. In this work, we use the heuristic cross-validation algorithm proposed by Doosten et al. [55] to estimate the truncation error for the OMP algorithm. Following their model, we assume (lc , d) = (0.2, 14), a ¯ = 0.1 and σa = 0.03. These choices ensure the stability of the OMP method. d = 14 is chosen as the 16

ACCEPTED MANUSCRIPT

√ ( λ5 )3 √ ( λ1 )3



M

AN US

CR IP T

eigenvalues retain 80% of the energy of the signal. The solution statistics are also compared against the conventional Monte Carlo simulation. We consider an increasing number N = {120, 200, 280, 360, 421, 600} of random solution samples to evaluate the solution u at x = 0.5 and, consequently, to compute the PC coefficients. In order to enhance the accuracy of the approximation, we have to reduce ||ˆ y − Ha||2 by increasing the order of the PCE. However, for a fixed number of training samples N , an increase in the order of the PCE, may make the problem severely undetermined and result in the degradation of the accuracy. Hence, we start by approximately the lower order PCE and increase the order until the error ||ˆ y − Ha||2 starts to increase. To make a meaningful comparison, for each N , the samples used to compute the solution statistics by the conventional Monte Carlo, OMP, and the proposed reduced-terms polynomial chaos expansions are identical. Figures 1 and 2 show the result of the proposed sparse scheme approximation of the solution u at x = 0.5 for different cut-off values of  = 0.1 and  = 0.2, compared to the Monte Carlo method and the OMP method. The number of component is d = 14 and the order of the sparse polynomial chaos representation for each method is illustrated in Table 3.1. It demonstrates that the reduced-terms polynomial chaos approximation gives a better approximation for  = 0.2 than  = 0.1 when small number of realizations is available N = 120. As the number of training run increases, the reduced-terms polynomial chaos expansion with  = 0.1 provides a more accurate approximation than  = 0.2. Hence, there is a trade-off between the cut-off value and the number of training runs. To illustrate the proposed reduced-terms polynomial chaos expansion, consider the 5th and √ 2 6th eigenvalues and the random variables ξ5 and ξ6 associated with. We have ((√λλ51 ))2 = 0.1822, 2

AC

CE

PT

ED

= 0.0778 and ((√λλ61 ))2 = 0.0852. If we choose the cut-off value  = 0.1, following Equation (22), the maximum order of ξ5 and ξ6 in the polynomial expansion of the polynomial chaos is 2 and 1, respectively. Consequently, only the first order of all other random variables associated with the eigenvalues smaller than λ6 will be retained in the reduced-terms polynomial chaos approximation of order n. As a result, for a given number of realization N , higher order of the polynomial chaos expansion can be retrieved. As the results clearly suggest, the proposed method is as accurate as the OMP method in specifying the sparsity pattern in the polynomial chaos expansion as it applied along with the Karhunen-Loeve expansion. This example is a relatively low dimensional problem with fast decaying eigenvalues, at which the OMP methods can be implemented successfully because the truncation error can be heuristically tuned. However, as the number of dimension increases, a major drawback of the OMP method lies in determining an appropriate value for the threshold as different threshold values could lead to different results [55]. On the contrary, the impact factor proposed in this paper is quite general and gives an upper bound on the coefficients of the PCE when it is applied along with the Karhunen-Loeve expansion, and it can be effectively applied for the higher dimensional problems when enough samples are available to compute the coefficients of the sparse PCE (the problem is not severely 17

ACCEPTED MANUSCRIPT

I

CR IP T

P

I

Figure 3: The synthetic water flooding model with two injectors on the left and one producer on the right. It also shows the reference permeability field used to generate the observed data of the oil flow rate.

AN US

undetermined).

 = 0.1  = 0.2 3 4 4 5 5 5 5 5 5 5 6 6

M

N OM P 120 3 200 4 280 4 360 4 421 4 600 4

ED

Table 1: The order of the sparse polynomial chaos expansions used by the proposed impact factor algorithm with  = 0.1 and  = 0.2, compared to the OMP method

PT

3.2. II. Synthetic Fluvial Channel Sand Model

AC

CE

The simulation model represents a 2D horizontal square reservoir with two water injectors that are placed inside the field in the left corner and one producer in the right as illustrated in Figure 3. The reservoir covers an area of 500 × 500m2 and has a thickness of 10 m. It is discretized into a 50 × 50 horizontal 2D grid. It is essentially an incompressible twophase unit mobility oil-water system with zero connate water saturation and zero oil residual system. The porosity is assumed to be 0.3 all over the field. The model is run for 1000 days with the injectors under rate control and the producers under bottom hole pressure (BHP) m3 control with predefined rates and BHP. The injection rate is 500 day for each injector. The observation is the oil production rate for 1000 days. The prior geological knowledge indicates that the permeability field is a fluvial channelized reservoir, which has a fine sand texture with permeability 100 mD and the permeability of the background matrix is 10 mD. The K contrast between the channel and the matrix is Khigh = 10. We also assume the variance of low the permeability field is as small as 1 mD. The relative sinuosity of the fluvial channel has a 18

CR IP T

ACCEPTED MANUSCRIPT

Figure 4: The water flooding process of the fluvial channel case II for 1000 days.

CE

PT

ED

M

AN US

mean of 0.2 and variance of 0.1, the thickness has a mean of 125 m and variance of 50 m, the amplitude has a mean of 150 m and variance of 50 m, and the wavelength ranges has a mean of 225 m and variance of 25 m. No other information about the orientation and the location of the channel is available. The objective is to obtain samples from the posterior distribution of the permeability field based on the observed data for the oil flow rate for 1000 days. Figure 4 illustrates the saturation map of the water flooding process of the fluvial channel for 1000 days and Figure 5 shows the noisily observed history of the reservoir. The true permeability model which is used as the reference case to generate the production data is a fluvial channel extended from one injectors to the producer as shown in Figure 3. Obviously the identical channel from the other injector to the producer gives the same oil flow rates as the reference case. Hence, at least two permeability models give the misfit surface zero value. Consequently, the posterior distribution of ξ in Equation (29) will essentially be considerably nonlinear. To assess the uncertainty associated with the permeability field, using the polynomial chaos proxy, we follow the same procedure of the previous case study while the higher order terms of the polynomial chaos proxy will be prominent in the Bayesian formulation (Equation (29)).

AC

• Reduced-order parameterization of the log-permeability field Using the prior information about the fluvial channel, we generate 1000 realizations using Petrel software (Schlumberger) and compute the ensemble mean and the covariance matrix. In generating the samples, we assume the log normal distribution for the permeability and Gaussian distribution for all other parameters i.e. thickness, amplitude, relative sinuosity and wavelengths of the fluvial channel with the specifications mentioned earlier. Since the orientation and the location of the fluvial channel are not specified in the prior geological knowledge, the realizations have a wide range of variance. We make the problem more challenging by assuming that the prior parameterization of the permeability field by the fluvial channel property is not available. 19

ACCEPTED MANUSCRIPT

1100 1000

800 700 600 500 400 300 200 100 0

0

100

200

300

400

500 Days

600

700

CR IP T

Oil production (mm3/day)

900

800

900

1000

AN US

Figure 5: The noisily observed history of the reservoir

Figure 6: To train the polynomial chaos proxy, N realizations are generated based on the given prior distribution.

ED

M

Hence, the starting point is to parameterize the permeability field based on the given N = 1000 realizations. We perform the Karhunen-Loeve parameterization using the ensemble covariance matrix and discard the small eigenvalues to reduce the dimension.

CE

PT

Figure 7 shows the scree graph of the eigenvalues. For a good sampling that retains at least 70% energy of the field, the minimum of 40 eigenvector has to be retained to preserve the shape of the channel. Discarding the small eigenvalues will smooth the sharp transition (high frequency changes) between the two distinct textures.

AC

• Construct the polynomial chaos proxy To train the polynomial chaos proxy with 40 random variables and N = 1000 simulation runs, we obtain the reduced-terms polynomial chaos expansion of order 8 for oil flow rate: p p X X op 2 Qop (ξ, t) = Qop (t) + Q (t)ξ + Qop (37) i 0 i,1 ii,2 (t)(ξi − 1) + . . . i=1

i=1

As discussed earlier, the reduced-terms polynomial chaos proxy preservers the relevant terms based on the impact factor. The cut-off value of  = 0.1 is chosen to drop the insignificant terms. To find the coefficients of the reduced-terms polynomial chaos expansion, we use regression-based PCM that minimizes the sum of squared errors. To evaluate the accuracy and performance of the PCE proxy, one simple common 20

ACCEPTED MANUSCRIPT

THe Scree Graph

The eigenvalue

800

600

400

200

0

10

20

30 40 50 The eigenvalues number i

60

70

80

CR IP T

0

Figure 7: The Scree plot for the ensemble covariance matrix of 1000 realizations The blind test data points for the PCE proxy

2500 2000

AN US

The approximated misfit using PCE proxy

3000

1500 1000 500

0

500 1000 1500 2000 2500 The actual misfit using reservoir simulator for {A1,...,A100}

3000

M

0

ED

Figure 8: The blind test for 100 random experiments of ξ, denoted by A1 , A2 , . . . , A100 .

CE

PT

experiment is to conduct a blind test by generating 100 random vectors of ξ, denoted by A1 , A2 , . . . , A100 , and compare the actual misfit calculated by the reservoir simulator with the misfit estimated by the PCE proxy. Figure 8 shows the blind test for the 40 parameter PCE proxy. The data points lying on the 45 degree line demonstrates the accuracy of the PCE proxy. It also shows that for the samples lying on the tales of the prior Gaussian distribution (large misfit), the PCE proxy becomes less accurate.

AC

• Bayesian inference formulation Using the misfit formulation in the Equation (28) and plugging in the polynomial chaos proxy substitute of the Equation (37) for oil flow rate, we obtain: S(ξ1 , ξ2 , . . . , ξp ) =

M op 2 X (Qop proxy (ti ) − Qobs )

σ 2 (ti )

i=1

= S0 +

p X i=1

Si,1 ξi +

p X i=1

Sii,2 (ξi2

(38) − 1) + . . . .

where M = 1000 is the number of measurements. We define σ(t) = σD Qop obs (t), where 21

ACCEPTED MANUSCRIPT

2 2 σD is constant. We solve the problem for σD = 0.1 and σD = 0.2. We also assume that 3 m the minimum noise (σmin ) is 30 day . Equation (29) gives an analytical approximation for the posterior probability distribution. We also assume the priors for ξi are all standard Gaussian.

CR IP T

• Metropolis-Hasting MCMC Having an analytical expression for the posterior distribution by the Equation (29) and assuming an i.i.d Gaussian distribution for ξ, we apply the Metropolis-Hasting MCMC, with a multivariate Gaussian as the proposal distribution, to sample from the posterior distribution of the permeability field. To check the convergence of the chain, all moments up to the 8th have to be examined for our desired precision (the 8th order of the polynomial chaos expansion is used).

AC

CE

PT

ED

M

AN US

Figure 9 shows the marginal posterior distribution for the 5 most dominant random variables {ξ1 , ξ2 , . . . , ξ5 }. The result of the polynomial chaos proxy of order 8 is compared with the expensive MCMC result using full-reservoir simulation. The dimensionless error 2 = 0.1. The histogram shows the true distribution (MCMC using in the observation is σD reservoir simulator) where the number of bins is 100. Figure 9 proves a perfect match between the polynomial chaos proxy of order 8 and the true posterior distribution. The number of the traditional MCMC steps to reach the stationary distribution using the full reservoir simulator is n2 = 260148 while the number of the required steps for MCMC using the polynomial chaos proxy to reach at the equilibrium state is n1 = 195218. The convergence criteria for the MCMC method is the convergence of the moments up to the 8th order. Hence, the speed-up of the proposed method compared to the traditional MCMC method is 260148×185 = 62.45 times, where 185 seconds is the running time for the full reservoir (185∗1000+195218×3) simulator as opposed to the 3 seconds of PCE proxy running time. To study the convergence of the analytical formulation for the posterior distribution using the polynomial chaos proxy to the true posterior distribution using the traditional MCMC method, we construct different orders of the polynomial chaos proxy and compare it against the true distribution. Figure 10 shows the marginal posterior distribution of ξ2 for different orders of the reduced-terms polynomial chaos proxy versus the true posterior distribution. It demonstrates the convergence of the polynomial chaos proxy-based formulation for the posterior distribution to the true distribution as the order of the expansion increases. The two maximum points in the posterior distribution of ξ2 corresponds to the two different solution sets; the channels extended from the upper injector to the producer (as the reference case), and alternatively the channels elongated from the lower injector to the producer. Accordingly the high-order terms in the misfit formulation of Equation (28) become prominent in the estimation of the posterior distribution. This verifies one of the great advantages of the polynomial chaos proxy over other proxy models that it guarantees the convergence in distribution as the order of the expansion increases. In fact, the rate of convergence to the 22

ACCEPTED MANUSCRIPT

2

The posterior probability distribution function of the dominant random variables, σD=0.1

0.2

f(ξ3)

0

1

−2

−1

0

1

2

2

3

3

−2

−1

0

1

2

3

−2

−1

0

1

2

3

−3

−2

−1

0

1

2

0.2 0 −3 0.4 0.2 0 −4

4

5

Polynomial Chaos Order 8 MCMC posterior histogram

0.2 0 −3 0.4

f(ξ4)

−1

0.2 0 −3 0.4

f(ξ5)

−2

CR IP T

f(ξ2)

0 −3 0.4

4

5

4

5

3

4

AN US

f(ξ1)

0.4

AC

CE

PT

ED

M

Figure 9: Estimation of the marginal posterior distribution of the most dominant random variables {ξ1 , ξ2 , . . . , ξ5 } using the 8th order polynomial chaos proxy, compared to the MCMC result with reservoir simulator.

Figure 10: The polynomial chaos proxy-based estimation for the posterior distribution of ξ2 converges to the true distribution as the order of the expansion increases.

23

ACCEPTED MANUSCRIPT

4. Summary and Conclusion

M

AN US

CR IP T

posterior distribution is exponential if ξ is Gaussian [26]. The first order terms (linear terms) in the polynomial chaos proxy estimates the posterior probability density function with a Gaussian distribution. The higher-order terms captures the non-Gaussian shape of the residual systematically. Hence the polynomial chaos proxy can be thought as a generalization to efficient linear estimators (such as the Ensemble Kalman Filter [14, 15]) that works for the non-Gaussian posterior distribution as well. Since we used the Karhunen-Loeve expansion to linearly parameterize the random field, we can obtain samples from the posterior distribution of the permeability field using the Equation (1). Figure 11 and Figure 12 show different samples from the posterior distribution 2 of the permeability field when the dimensionless error in the observation is σD = 0.1 and 2 σD = 0.2. It demonstrates more variability in the posterior samples of the permeability field when the variance of the observational error increases. This case study verifies the efficiency of the polynomial chaos proxy framework to sample from the posterior distribution when the inverse problem is remarkably nonlinear. We proved that the high-order polynomial chaos proxy can capture the nonlinear effects on the posterior distribution. Figure 13 illustrates the full-simulator run for the obtained samples of Figure 11 from the 2 posterior distribution of permeability field when σD = 0.1. It proves a good match with the observed history of the reservoir.

PT

ED

This paper pursues the application of the polynomial chaos expansions as a proxy model to probe the posterior distribution for parameter estimation (aka. history matching) and uncertainty quantification in subsurface reservoir models. Our findings are summarized as follows:

AC

CE

• For a given number of trial runs, the coefficients associated with the higher order polynomial chaos terms have larger estimation errors compared to the lower order terms, and therefore there is a balance between the number of trial runs and the maximum possible order of the polynomial chaos expansions that can be reliably used. • For high-dimensional problems, the number of the polynomial chaos terms increases drastically as the order of the polynomial chaos expansions increases. Although different non-intrusive methods have been developed in the literature to address this issue, but still a large number of the simulation runs is required to compute high-order polynomial chaos expansions. In our proposed polynomial chaos proxy which integrates polynomial chaos expansion with the Karhunen-Loeve decomposition method, the sparsity pattern of the polynomial chaos can be determined using the relative importance of Karhunen-Loeve modes. In the sense that among the cross-terms associated with 24

CE

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

Figure 11: Samples from the posterior probability distribution of permeability when the dimensionless error 2 in the observation is σD = 0.1 and 40 eigenvectors are retained. Karhunen-Loeve expansion is used for dimensionality reduction and the reduced-terms polynomial chaos expansion of order 8 is used for the reservoir proxy model.

25

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

CE

PT

ED

Figure 12: Samples from the posterior probability distribution of permeability when the dimensionless error 2 = 0.2 and 40 eigenvectors are retained. Karhunen-Loeve expansion is used for in the observation is σD dimensionality reduction and the reduced-terms polynomial chaos expansion of order 8 is used for the reservoir proxy model.

Oil production (mm3/day)

AC

1000 800 600 400 200 0

0

100

200

300

400

500 Days

600

700

800

900

1000

2 Figure 13: The reservoir simulator run for the obtained samples from the posterior distribution when σD = 0.1, (red dash lines), versus the simulator run for the prior samples (green dash lines).

26

ACCEPTED MANUSCRIPT

the high-frequency Karhunen-Loeve modes (the modes related to the small eigenvalues) only few terms are relevant and the rest become adequately negligible. We introduced the impact factor for each term to specify the relevant terms in the polynomial chaos proxy and discard the rest. Accordingly, for high-dimensional problems, with a reasonable number of simulation runs, the high-order reduced-terms polynomial chaos proxy can be efficiently constructed.

AN US

CR IP T

• The use of the polynomial chaos proxy in the Bayesian context, where the polynomial chaos representation replaces the full forward model in the likelihood function, obtains an analytical expression for the reduced-dimension posterior distribution that can be evaluated orders of magnitude faster than the original posterior. Additionally, with the respect to the fact that running the proxy model is cheap, the Markov Chain Monte Carlo methods can be coherently used to efficiently sample from the reduced-dimension approximation of the posterior distribution.

ED

M

• Comparing to the traditional MCMC methods which typically needs more than 10s of thousands of full reservoir simulation runs, the polynomial chaos proxy-based MCMC can probe the posterior distribution much faster, using a few hundreds of the full reservoir simulation runs. For remarkably nonlinear surface responses the high-order polynomial chaos proxy has to be used to preserve higher-order moments of the posterior distribution. The number of the simulation runs required to reasonably capture the nonlinear effects of the posterior distribution using a high order polynomial chaos proxy is problem specific. In general, the accuracy of the polynomial chaos proxy depends on:

PT

1. The quality of the input data (the trial simulation runs), 2. The order of the polynomial chaos expansions, 3. The number of eigenvectors retained.

AC

CE

For extremely nonlinear surface response, one can expect that the number of the simulation runs required to construct a reliable polynomial chaos proxy increases and the proposed method becomes less efficient. • When the eigenvalues of the Karhunen-Loeve decomposition are not reasonably fast decaying, the number of the relevant terms in the reduced-terms polynomial chaos representation increases and the number of the full reservoir simulation runs required to compute the polynomial chaos coefficients increases. Hence, the proposed method becomes less efficient when the random field can not be effectively expressed using the Karhunen-Loeve expansion. • The prior distribution for the reduced dimension random variables of the KarhunenLoeve decomposition is assumed to be uncorrelated multivariate Gaussian distribution. 27

ACCEPTED MANUSCRIPT

Consequently, the standard polynomial chaos will have an exponential convergence rate to the true misfit surface as the order of the expansions increases. The posterior distribution of the reduced-dimension random variables are not necessarily Gaussian and they might be even considerably correlated.

AC

CE

PT

ED

M

AN US

CR IP T

• The PCE proxy is constructed based on the prior distribution of the parameters. Therefore, the PCE proxy will be able to represent the central tendency of the prior distribution of the model response with sufficient accuracy. However, this might not necessarily be the case where a measurement with high accuracy deviates considerably from the prior mean response. In such a case, samples obtained from the MCMC algorithm would end up in the tail of the prior distribution, which would now be the area of high probability mass of the posterior distribution. It is well known that the PCE approximation has slow convergence in higher moments and hence needs a large number of terms to describe accurately the tail of the distribution. For such problems, we would expect that the performance of the PCE proxy in approximating the posterior distribution of model parameters would be considerably enhanced if the proxy were adapted with additional model evaluations as the MCMC approaches the posterior distribution, as in [76].

28

ACCEPTED MANUSCRIPT

References [1] Carter, J. and Ballester, P, A real parameter genetic algorithm for cluster identification in history matching, 30 (2004). [2] Poli, Riccardo and Kennedy, James and Blackwell, Tim, Particle swarm optimization, Journal Swarm intelligence 1 (2007).

CR IP T

[3] Zhang, Fengjun and Reynolds, Albert C, Optimization algorithms for automatic history matching of production data, 8th European Conference on the Mathematics of Oil Recovery (2002). [4] Sambridge, Malcolm, Geophysical inversion with a neighbourhood algorithm-II. Appraising the ensemble , Geophysical Journal International 138 (1999).

AN US

[5] Petrovska, I and Carter, J., Estimation of distribution algorithms for history-matching, Proc. of the 10th European Conf. on the Mathematics of Oil Recovery. EAGE, Amsterdam (2006). [6] Li, Ruijian and Reynolds, AC and Oliver, DS, History Matching of Three-Phase Flow Production Data, SPE Journal 8, (2003).

M

[7] Doucet, Arnaud and De Freitas, Nando and Gordon, Neil, An introduction to sequential Monte Carlo methods, Springer, (2001).

PT

ED

[8] Zhang, Fengjun and Reynolds, Albert C, Optimization algorithms for automatic history matching of production data, 8th European Conference on the Mathematics of Oil Recovery, (2002).

CE

[9] Oliver, Dean S and Chen, Yan, Recent progress on reservoir history matching: a review, Computational Geosciences 15, (2011) [10] Wu, Z., A Newton-Raphson Iterative Scheme for Integrating Multiphase Production Data into Reservoir Models, SPE Journal 6 (2001)

AC

[11] Liu, N. and Oliver, D., Automatic history matching of geologic facies, SPE Journal 9, (2004). [12] Erbas, Demet and Christie, Michael, Effect of sampling strategies on prediction uncertainty estimation, Proceedings of the SPE Reservoir Simulation Symposium, SPE 106229, 26-28, (2007). [13] Jaynes, Edwin T, Probability theory: the logic of science, (Cambridge university press, 2003). 29

ACCEPTED MANUSCRIPT

[14] Evensen, Geir, Data assimilation: the ensemble Kalman filter, (Springer, 2009). [15] Aanonsen, SI and Naevdal, G and Oliver, DS and Reynolds, AC and Valles, B, The Ensemble Kalman Filter in Reservoir Engineering–a Review, SPE Journal 14, (2009).

CR IP T

[16] Oliver, Dean S and Cunha, Luciane B and Reynolds, Albert C, Markov chain Monte Carlo methods for conditioning a permeability field to pressure data, Mathematical Geology 29, (1997). [17] Ma, Xianlin and Al-Harbi, Mishal and Datta-Gupta, Akhil and Efendiev, Yalchin, An efficient two-stage sampling method for uncertainty quantification in history matching geological models, SPE Journal 13 (2008).

AN US

[18] Tarantola, Albert, Inverse problem theory and methods for model parameter estimation, (SIAM, 2005) [19] Stramer, O and Tweedie, RL, Langevin-Type Models II: Self-Targeting Candidates for MCMC Algorithms, Methodology and Computing in Applied Probability 1, (1999). [20] Duane, Simon and Kennedy, Anthony D and Pendleton, Brian J and Roweth, Duncan, Hybrid monte carlo, Physics letters B 195, (1987).

ED

M

[21] Holloman, Christopher H and Lee, Herbert K H and Higdon, Dave M, Multiresolution genetic algorithms and Markov chain Monte Carlo, Journal of Computational and Graphical Statistics 15, (2006).

PT

[22] Vrugt, Jasper A and Ter Braak, CJF and Diks, CGH and Robinson, Bruce A and Hyman, James M and Higdon, Dave, Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling, International Journal of Nonlinear Sciences and Numerical Simulation 10, (2009).

AC

CE

[23] Emerick, AA and Reynolds, AC, Combining the Ensemble Kalman Filter With MarkovChain Monte Carlo for Improved History Matching and Uncertainty Characterization, SPE Journal 17, (2012). [24] Oliver, Dean S and Reynolds, Albert C and Liu, Ning, Inverse theory for petroleum reservoir characterization and history matching, (Cambridge University Press, 2008). [25] Jansen, J.D. and Brouwer, D.R. and Naevdal, G. and Van Kruijsdijk, Closed-loop reservoir management, First Break 23, (2005). [26] Sarma, P., Efficient closed-loop optimal control of petroleum reservoirs under uncertainty, Stanford University, (2006).

30

ACCEPTED MANUSCRIPT

[27] Ghanem, Roger G and Spanos, Pol D, Stochastic finite elements: a spectral approach, (Courier Dover Publications, 2003). [28] Xiu, Dongbin and Karniadakis, George Em, The Wiener–Askey Polynomial Chaos for Stochastic Differential Equations, SIAM Journal on Scientific Computing 24, (2002).

CR IP T

[29] Babuska, Ivo and Tempone, Raul and Zouraris, Georgios E, Galerkin finite element approximations of stochastic elliptic partial differential equations, SIAM Journal on Numerical Analysis 42, (2004). [30] Babuska, Ivo and Tempone, Raul and Zouraris, Georgios E, Solving elliptic boundary value problems with uncertain coefficients by the finite element method: the stochastic formulation, Computer methods in applied mechanics and engineering 194, (2005).

AN US

[31] Matthies, Hermann G and Bucher, Christian, Finite elements for stochastic media problems, Computer methods in applied mechanics and engineering 168, (1999). [32] Todor, Radu Alexandru and Schwab, Christoph, Convergence rates for sparse chaos approximations of elliptic problems with stochastic coefficients, IMA Journal of Numerical Analysis 27, (2007).

M

[33] Xiu, Dongbin and Hesthaven, Jan S, High-order collocation methods for differential equations with random inputs, SIAM Journal on Scientific Computing 27, (2005).

ED

[34] Chen, W.H., Gavalas, G.R., Seinfeld, J.H., and Wasserman, A New Algorithm for Automatic History Matching, SPE Journal 14, (1974).

PT

[35] Oliver, Dean, Multiple realizations of the permeability field from well test data, SPE Journal 1, (1996).

CE

[36] Reynolds, Albert C and He, Nanqun and Chu, Lifu and Oliver, Dean S, Reparameterization techniques for generating reservoir descriptions conditioned to variograms and well-test pressure data, SPE Journal 1, (1996).

AC

[37] Wiener, Norbert, The homogeneous chaos, American Journal of Mathematics 60, (1938). [38] Tatang, Menner A, Direct incorporation of uncertainty in chemical and environmental engineering systems, Massachusetts Institute of Technology, (1995). [39] Isukapalli, Sastry S, Uncertainty analysis of transport-transformation models, (Rutgers, The State University of New Jersey, 1999). [40] Li, Ruijian and Reynolds, AC and Oliver, DS, History Matching of Three-Phase Flow Production Data, SPE Journal 8, (2003) 31

ACCEPTED MANUSCRIPT

[41] Montgomery, Douglas C, Design and analysis of experiments, (Wiley. Com, 2006). [42] Li, Heng and Sarma, Pallav and Zhang, Dongxiao, A comparative study of the probabilistic-collocation and experimental-design methods for petroleum-reservoir uncertainty quantification, SPE Journal 16, (2011).

CR IP T

[43] Metropolis, Nicholas and Ulam, Stanislaw, The monte carlo method, Journal of the American statistical association 44, (1949) [44] Metropolis, Nicholas and Rosenbluth, Arianna W and Rosenbluth, Marshall N and Teller, Augusta H and Teller, Edward, Equation of state calculations by fast computing machines, The journal of chemical physics 21, (1953).

AN US

[45] Hastings, W Keith, Monte Carlo sampling methods using Markov chains and their applications, Biometrika 57, (1970). [46] Roberts, Gareth O and Rosenthal, Jeffrey S, Optimal scaling for various MetropolisHastings algorithms, Statistical Science 16, (2001). [47] Jolliffe, Ian, Principal component analysis, (Wiley Online Library, 2005).

M

[48] Blatman, Geraud and Sudret, Bruno, Sparse polynomial chaos expansions and adaptive stochastic finite elements using a regression approach, Comptes Rendus Mecanique 336, (2008).

PT

ED

[49] Blatman, Geraud and Sudret, Bruno, An adaptive algorithm to build up sparse polynomial chaos expansions for stochastic finite element analysis, Probabilistic Engineering Mechanics 25, (2010).

CE

[50] Blatman, Geraud and Sudret, Bruno, Adaptive sparse polynomial chaos expansion based on least angle regression, Journal of Computational Physics 230, (2011).

AC

[51] Lin, Guang and Tartakovsky, Alexandre M and Tartakovsky, Daniel M, Uncertainty quantification via random domain decomposition and probabilistic collocation on sparse grids, Journal of Computational Physics, Vol. 229, (2010). [52] Floater, Michael and Lyche, Tom Two chain rules for divided differences and Faa di Brunos formula, Journal of Mathematics of Computation 258, (2007).

[53] Jolliffe, Ian, Principal component analysis, Wiley Online Library, (2005). [54] Stuart, Andrew M, Inverse problems: a Bayesian perspective, Acta Numerica Journal 19, Cambridge Univ Press (2010).

32

ACCEPTED MANUSCRIPT

[55] Doostan, Alireza and Owhadi, Houman A non-adapted sparse approximation of PDEs with stochastic inputs, Journal of Computational Physics, (2011). [56] Marzouk, Youssef M and Najm, Habib N Dimensionality reduction and polynomial chaos acceleration of Bayesian inference in inverse problems, Journal of Computational Physics, (2009).

CR IP T

[57] R. A. Todor and C. SchwabConvergence rates for sparse chaos approximations of elliptic problems with stochastic coefficients., IMA Journal of Numerical Analysis, (2007). [58] A. Doostan, R. Ghanem, and J. Red-HorseStochastic model reduction for chaos representations., Computer Methods in Applied Mechanics and Engineering, (2007).

AN US

[59] M. Bieri and C. Schwab.Sparse high order FEM for elliptic sPDEs., Computer Methods in Applied Mechanics and Engineering, (2009). [60] G. Blatman and B. Sudret.An adaptive algorithm to build up sparse polynomial chaos expansions for stochastic finite element analysis., Probabilistic Engineering Mechanics, (2010).

M

[61] F. Nobile, R. Tempone, and C.G. Webster.An anisotropic sparse grid stochastic collocation method for partial differential equations with random input data., SIAM Journal, (2008).

ED

[62] X. Ma and N. Zabaras. An adaptive hierarchical sparse grid collocation algorithm for the solution of stochastic differential equations. Journal of Computational Physics, (2009).

PT

[63] A. Nouy. A generalized spectral decomposition technique to solve a class of linear stochastic partial differential equations. Computer Methods in Applied Mechanics and Engineering, (2007).

AC

CE

[64] A. Nouy. Generalized spectral decomposition method for solving stochastic finite element equations: Invariant subspace problem and dedicated algorithms. Computer Methods in Applied Mechanics and Engineering, (2008). [65] A. Doostan and G. Iaccarino A least-squares approximation of partial differential equations with high-dimensional random inputs. Journal of Computational Physics, (2009). [66] Haiyan Zhou, J. Jaime Gomez-Hernandez, Liangping Li, Inverse methods in hydrogeology: Evolution and recent trends., Advances in Water Resources, Volume 63, January 2014, Pages 22–37. [67] J. Tropp, Greed is good: algorithmic results for sparse approximation, IEEE Transactions on Information Theory 50 (2004). 33

ACCEPTED MANUSCRIPT

[68] L. Yan, L. Guo, D. Xiu, Stochastic collocation algorithms using `1 -minimization, International Journal for Uncertainty Quantification 279–293, (2012). [69] J. Tropp, A. Gilbert, Signal recovery from random measurements via Orthogonal Matching Pursuit, IEEE Transactions on Information Theory 279–293, (2007).

CR IP T

[70] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression., Annals of statistics, Advances in Water Resources, Volume 63, January 2014, Pages 22–37, (2004). [71] A. H. Elsheikh, I. Hoteit, M. F. Wheeler, Efficient bayesian inference of sub-surface flow model using nested sampling and sparse polynomial chaos surrogates, Computer Methods in Applied Mechanics and Engineering 269 (2014).

AN US

[72] Brouwer, DR and Jansen, JD, Dynamic optimization of water flooding with smart wells using optimal control theory, European Petroleum Conference (2002). [73] Efendiev, Y and Datta-Gupta, A and Ginting, V and Ma, X and Mallick, B, An efficient two-stage Markov chain Monte Carlo method for dynamic data integration, Water Resources Research (2005).

M

[74] J.A. Tropp, Greed is good: algorithmic results for sparse approximation, IEEE Trans. Inf. Theory, (2004).

ED

[75] J.A. Tropp and A.C. Gilbert.Signal recovery from random measurements via Orthogonal Matching Pursuit. IEEE Trans. Inf. Theory, (2007).

PT

[76] Laloy, Eric and Rogiers, Bart and Vrugt, Jasper A and Mallants, Efficient posterior exploration of a high-dimensional groundwater model from two-stage Markov chain Monte Carlo simulation and polynomial chaos expansion, Water Resources Research, Volume 49, (2013).

AC

CE

[77] Saad, George and Ghanem, Roger, Characterization of reservoir simulation models using a polynomial chaos-based ensemble Kalman filter, Water Resources Research, Volume 45, (2009). [78] Zeng, Lingzao and Zhang, Dongxiao, A stochastic collocation based Kalman filter for data assimilation, Computational Geosciences, Volume 14, (2010).

34