Subsurface characterization with localized ensemble Kalman filter employing adaptive thresholding

Subsurface characterization with localized ensemble Kalman filter employing adaptive thresholding

Advances in Water Resources 69 (2014) 181–196 Contents lists available at ScienceDirect Advances in Water Resources journal homepage: www.elsevier.c...

3MB Sizes 1 Downloads 94 Views

Advances in Water Resources 69 (2014) 181–196

Contents lists available at ScienceDirect

Advances in Water Resources journal homepage: www.elsevier.com/locate/advwatres

Subsurface characterization with localized ensemble Kalman filter employing adaptive thresholding Ebrahim Biniaz Delijani, Mahmoud Reza Pishvaie ⇑, Ramin Bozorgmehry Boozarjomehry Department of Chemical & Petroleum Engineering, Sharif University of Technology, Azadi Ave., P.O. Box 11155-9465, Tehran, Iran

a r t i c l e

i n f o

Article history: Received 30 August 2013 Received in revised form 5 April 2014 Accepted 9 April 2014 Available online 21 April 2014 Keywords: Subsurface characterization Ensemble Kalman filter Adaptive thresholding Thresholding functions Covariance localization Kalman gain regularization

a b s t r a c t Ensemble Kalman filter, EnKF, as a Monte Carlo sequential data assimilation method has emerged promisingly for subsurface media characterization during past decade. Due to high computational cost of large ensemble size, EnKF is limited to small ensemble set in practice. This results in appearance of spurious correlation in covariance structure leading to incorrect or probable divergence of updated realizations. In this paper, a universal/adaptive thresholding method is presented to remove and/or mitigate spurious correlation problem in the forecast covariance matrix. This method is, then, extended to regularize Kalman gain directly. Four different thresholding functions have been considered to threshold forecast covariance and gain matrices. These include hard, soft, lasso and Smoothly Clipped Absolute Deviation (SCAD) functions. Three benchmarks are used to evaluate the performances of these methods. These benchmarks include a small 1D linear model and two 2D water flooding (in petroleum reservoirs) cases whose levels of heterogeneity/nonlinearity are different. It should be noted that beside the adaptive thresholding, the standard distance dependant localization and bootstrap Kalman gain are also implemented for comparison purposes. We assessed each setup with different ensemble sets to investigate the sensitivity of each method on ensemble size. The results indicate that thresholding of forecast covariance yields more reliable performance than Kalman gain. Among thresholding function, SCAD is more robust for both covariance and gain estimation. Our analyses emphasize that not all assimilation cycles do require thresholding and it should be performed wisely during the early assimilation cycles. The proposed scheme of adaptive thresholding outperforms other methods for subsurface characterization of underlying benchmarks. Ó 2014 Elsevier Ltd. All rights reserved.

1. Introduction Providing an accurate numerical geo-model for subsurface reservoir by employing static and dynamic data of corresponding medium is an important step towards efficient subsurface reservoir management and maximization of reserve recovery. In general, the (inverse) modeling consists of static and dynamic steps: In the static step, static data are used to build a geo-model, while in the dynamic part (which is also called model calibration or data assimilation), the model developed in the static step is calibrated and improved based on dynamic measurements [33]. Dynamic data integration into the geo-model is a data assimilation activity (also termed as history matching, scanning or tomography and

⇑ Corresponding author. Tel.: +98 (21) 6616 6429, mobile: +98 9123058396; fax: +98 (21) 6602 2853. E-mail addresses: [email protected] (E.B. Delijani), [email protected] (M.R. Pishvaie), [email protected] (R.B. Boozarjomehry). http://dx.doi.org/10.1016/j.advwatres.2014.04.011 0309-1708/Ó 2014 Elsevier Ltd. All rights reserved.

inverse modeling) whose objective is to find the parameters and states of model from (historical) measurements [26]. The frequently used approach for sequential data assimilation is the Kalman filter (KF) which is an optimal, efficient, recursive (and unbiased) filter. KF estimates the states of a linear dynamical system from a series of noise corrupted measurements [27]. The original KF, however, suffers from three major shortcomings: linearity, applicability for small scale (states and parameters) problems and Gaussianity/normality. Meanwhile, many realistic problems present in the subsurface characterization are non-linear, largescale and non-Gaussian. The KF was, then, extended early to work with nonlinear models through the Extended Kalman Filter (EKF). The shortcomings of the EKF are not only in the cases of highly nonlinear models but also it becomes infeasible for large-scale systems. During the last decades, several alternatives to the EKF have been developed, such as Sigma point filter [43] for better handling of non-linear models and reduced order filters [34] to handle data assimilation in large-scale systems. For non-Gaussian problems, any filter that either assumes a Gaussian probability distribution

182

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

or uses the information from first- and second-order moments in its analysis equation would not produce a correct estimate of posterior distribution. Although there are some better estimators like particle filters [4] to tackle this problem, there is still little contribution or its application for large-scale systems. This is due to the difficulty of locating particles in regions of large likelihood when the dimension of model is large [1]. Nevertheless, the detailed descriptions of all mentioned approaches along with other paradigms (such as representer methods) are beyond the scope of this study and we have focused on localized EnKF. In the past decade, due to some unique features of ensemble Kalman filter (EnKF), it has emerged as one of the promising methods to condition nonlinear geo-model to dynamic measurements sequentially [1,39,42]. The EnKF, proposed by Evensen in [18], is based on the original KF. In its native formulation, EnKF uses a Monte Carlo sampling to generate the initial ensemble, model noise and measurement perturbations. Each member of the ensemble is then used in a nonlinear dynamic simulator and analyzed via the standard KF analysis scheme [30]. In EnKF, model states and information (as covariance matrix) are being stored, propagated and assimilated via a low rank representation of error covariance matrix through employment of a relatively large stochastic ensemble of model realizations [17]. This low rank approximation and/or sampling of error covariance yields spurious correlations and/or underestimation of true correlations which leads to improper characterization of the geo-model [15]. To mitigate the problem, a simple way is to increase the number of model realizations or ensemble size in EnKF initialization step. Increasing the ensemble size, however, results in high computational cost for propagation of model realizations in large geomodels. As a result, in practice the application of EnKF is often considered with smallest possible ensemble size. This in turn leads to spurious correlation in covariance structure causing poor approximation of Kalman gain [47]. This subsequently results in wrong or even unrealistic changes to the numerical state and model variables. In other words, this eventuates to underestimation of posterior state covariance, which provides inappropriate estimation of geo-model parameters such as porosity and hydraulic conductivity. Repeating the assimilation cycles with such behaviors, also, rapidly diminishes ensemble variance [31]. Multiplying forecast state covariance matrix elementwise – Schur or Hadamard product – with a locally supported correlation matrix yields an effective approach to overcome some of the small ensemble effects. This approach maintains the ensemble variance partly and removes spurious correlations to some extent [11]. The procedure known as covariance localization, not only relieves long-range spurious correlation to some extent but also increases the ‘‘degree of freedom’’ for data assimilation [15]. From the first application of distance dependent covariance localization [23], extensive researches were performed in EnKF localization [7,8,24,25,32,38]. In the following paragraphs, some of these methods are reviewed. It should be noted that we have discriminated the ‘‘covariance localization’’ by its Kalman gain counterpart as ‘‘gain regularization’’. It is better to assign ‘‘localization’’ for covariance and ‘‘regularization’’ for gain to clarify the fundamental difference between the ‘‘gain’’ computed from localized forecast covariance and the ‘‘gain’’ obtained by direct regularization of the Kalman gain matrix itself. In distance dependent covariance localization, it is assumed that state variables within a specific region, defined by critical length, have correlations with measurements in that region and any correlations beyond cut off distance are disregarded [15,23,24]. Although implementation of the distance based localization is quite straight forward, estimation of cut off radius is not easy and severely depends on the underlying flow dynamic and structure [11]. It should be noted that this is aside from the

fact that the distance based localization is not appropriate for non-local observations [2,19]. Another localization method used for identification of the sensitive region of certain measurement utilizes flow streamlines [5,13]. This approach was later modified to include phase information in building the localization matrix [45]. However, not every simulator has an embedded streamline calculator, while application of streamline localization for nonlocal observations is problematic. A series of adaptive covariance localization or (alternatively) regularized Kalman gain have, also, been reported whose first one is in [2]. These studies may be categorized into four major families: hierarchical, bootstrapped, raised to a power (RAP) and gain regularization. In the hierarchical EnKF [2], a group of independent ensemble sets is applied to estimate confidence factors for each entry of the regression coefficients. These confidence factors are then utilized to estimate the reliability of the regression coefficients. Later, this method was modified in [47] to relax the requirement of ensemble group for computation of confidence factors. Here, instead of an ensemble group, with the aid of a single ensemble set, the confidence factors are computed with bootstrap re-sampling, i.e. bootstrap EnKF. In [18] authors employed a ‘‘correlation raised to a power’’ technique to compute the adaptive localization function, as an extension to their previous work [8]. The last family of adaptive localization is threshold regularization of Kalman gain [44]. Here, every element of Kalman gain is compared with a thresholding value. The value of the element is kept untouched only if the absolute value of the entry is higher than the threshold value, and is set to zero. Finally, in [41] authors provide methods for local estimation of the covariance in the state space which led to local ensemble filters. In this work, we applied the adaptive thresholding for both covariance localization and Kalman gain regularization. As the thresholding approach in [44], which has applied the hard thresholding on Kalman gain, we perform thresholding on both covariance and Kalman gain in order to mitigate spurious correlations raised from small ensemble size. The proposed method is quite straightforward and it does not require estimation of any critical length or major changes in the available EnKF code. In addition, applying the proposed method to characterize the cases studied here, displays very small bias in covariance localization results. There is, however, some bias in regularization of Kalman gain. It should be also emphasized that the proposed approach works with both local and non-local observations. Finally, it works with available black-box simulators and does not require any special treatment or calculation of other dynamic flow attributes. In the next sections, at first, the methodology is presented and then the proposed method, in Section 3, is applied to various synthetic experimental settings. These cases cover a wide range of applications from a simple linear toy model to a 2D subsurface anisotropic one. In the discussion section, some notes about this work are further clarified. It is then followed by the summary and conclusion section in which, our findings are summarized. 2. Methodology In this section, at first EnKF with perturbed measurements [9,23] is briefly explained. In addition to generalized adaptive thresholding, two other localization approaches for comparison with adaptive thresholding are, also, briefly described. 2.1. Ensemble Kalman filter Ensemble Kalman filter (EnKF) and its theory, numerical properties and implementation algorithm are comprehensively

183

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

described in [16,17]. Here, to present a consistent formulation through this work we recall the formulation and notations. EnKF is a Monte Carlo implementation of original Kalman filter performing data assimilation sequentially. EnKF works with an ensemble of model realizations and its assimilation cycle consists of two main steps. In the forecasting step, previous updated model realizations are propagated to current observation time with a numerical simulator obtaining forecast f state vector xfi . While in the analysis step, the acquired forecast data are being assimilated using a Bayesian integration adjusting for new observation. To preserve the simplicity of equations we removed the time domain index. Concatenating the forecast realizations state vector into a single matrix with removing ensemble mean provides the forecast ensemble perturbation matrix Xf and ith perturbation vector Xfi ¼ xfi  xf :

h i 1 Xf ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi xf1  xf ; xfi  xf ; . . . ; xfN  xf ; N1

i ¼ 1; . . . ; N

ð1Þ

In which xf represent forecast ensemble mean and N is the number of ensemble members.

xf ¼

N 1X xf N i¼1 i

ð2Þ

Now, it is easy to compute the forecast state error covariance Pf of size n  n, in which n is size of the state vector:

Pf ¼

N   T 1 X xfi  xf xfi  xf ¼ Xf XfT N  1 i¼1

  xa ¼ xf þ K d  Hxf

ð4Þ

  Xa ¼ Xf þ K D  HXf

ð5Þ

xai ¼ Xai þ xa

ð6Þ

Here, K is the Kalman gain of size n  d and D ¼ ½e1 ; e2 ; . . . ; eN  is an ensemble of perturbation of measurement error sampled from measurement covariance matrix R of size d  d [9,26]. Kalman gain is obtained as below:

ð7Þ

H with size d  n is an observation operator mapping state space to observation space and in many cases can be constructed at every assimilation cycle as in Eq. (8). Matrices 0 and I are zeros and identity matrices with given size. It is worth’s noting that observation operator matrix is highly specific to assimilation problem at hand. The form presented in Eq. (8) is employed for cases 2 and 3 in the following sections.

  H ¼ 0dðndÞ Idd

ð8Þ f

T

2.2. Covariance localization/gain regularization Theoretically speaking, for any transformable model into Gaussian-like with a huge ensemble size (N sufficiently large to mimic the infinity), EnKF does never diverge [48]. However, as pointed out previously, operational application of EnKF is limited to small ensemble size in order to avoid high computational cost during realizations propagation. In this case, however, EnKF may diverge even for Gaussian model. Covariance localization or Kalman gain regularization attempts to provide a relatively more robust estimation than sample covariance matrix, Eq. (3), or sample Kalman gain matrix, Eq. (7). This is accomplished through elimination of spurious correlations in Eq. (3) or spurious entries in Eq. (7). Mathematically speaking, either the covariance matrix Pf or the Kalman gain matrix K, are replaced with their localized Ploc or regularized Kreg versions, respectively. It should be noted that the label ‘‘localization of covariance matrix Pf ’’ is employed to describe the whole localization procedure, and it does not mean localization of covariance actually performed on Pf . In practice the actual localization is done on sub-matrices of Pf which are Pf HT and HPf HT .

ð3Þ

where T denotes vector/matrix transpose. During the assimilation step, forecast states are being assimilated with real measurement d of size d  1 to obtain realizations adjusted to most recent observation by employing Kalman equation. In this step, assimilated ensemble mean xa , member xai and perturbation matrix Xa can be computed:

 1 K ¼ Pf HT HPf HT þ R

the results will differ seriously (in some cases) whether we use the localized version of covariance matrix to compute the Kalman gain matrix or obtain the gain by direct regularization of the gain matrix itself. Hereafter, the former paradigm will be called ‘‘covariance localization’’ while the latter will be labeled as ‘‘gain regularization’’. For more details, refer to [44].

Note that the product of P H is regarded as the covariance between the variables in the state vector and predicted data. Similarly, the product of HPf HT is the covariance of the predicted data. These two covariance matrices are inter-related and localization may be applied to them or being applied directly to the Kalman gain, instead. It should be reminded that, we have discriminated those two above-mentioned paradigms of localization. Indeed,

2.2.1. Distance based covariance localization In distance based localization, correlations beyond a predefined range – known as critical or correlation length Lc – are neglected or weighted much less than those within range. Here, the ensemble set is updated with localized version of the forecast error covariance matrix Pf [22], i.e.:

Ploc ¼ q  Pf

ð9Þ

Matrix q is the positive definite localization matrix (which is also called correlation matrix) and the operator  indicates an elementwise matrix multiplication known as Schur or Hadamard product. There are many different functions with compact support discussed in [20] which can provide the correlation matrix q. Here, the fifth-order Gaspari and Cohn correlation function is employed [20]:

8 5 4 3 2 >  r4 þ r2 þ 5r8  5r3 þ 1 > > > 5 4 3 > < þ r  r þ 5r þ 5r2  5r þ 4  2 3r 12 2 8 3 qðrÞ ¼ >0 > > > kDij k > : r, pffiffiffiffiffiffiffi

06r61 16r62 26r

ð10Þ

10=3lc

where kDij k is the Euclidian distance between grid point (i, j) and the observation location. In this equation, lc is defined as a length scale and the ‘‘cut off’’ distance numbers 1 and 2 control the localization dimensionless radius r. 2.2.2. Bootstrap Kalman gain regularization Anderson [2] has proposed a hierarchal EnKF in which the reliability of the regression coefficients are estimated from a group of independent ensemble sets by computing a confidence factor for any entries of the regression coefficient matrix. Later, Zhang and Oliver [47], employed a bootstrap sampling to get a sample from a single ensemble set to compute the confidence factor. In their work, an alternate bootstrap was, also, presented to regularize the objective function in order to obtain a more smoothed estimation of confidence factor. They applied the following entry dependent confidence factor ai;j to minimize the objective function [47].

ai;j ¼

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

1   1 þ 1 þ 1=r2a C 2i;j

ð11Þ

Here, ra is a tuning parameter to improve the estimation of ai;j and C i;j is coefficient of variation of i; j entry of Kalman gain computed from bootstrap sampling. It is note-worthy that the algorithm leads to some biased estimation of gain matrix for both local [47] and non-local observations [12]. 2.3. Adaptive thresholding For large scale covariance matrices, the standard estimator of Eq. (3) known as sample covariance matrix is usually not well-conditioned when N < n [29]. To obtain a well-conditioned covariance matrix, a wide variety of methods has been presented to cope with the smallness of sampled data from population [35]. Ledoit and Wolf [29] introduced a shrinkage approach that is a linear optimal combination of sample covariance and an identity matrix of the same size providing a well-conditioned covariance asymptotically. Later, [40] presented six different shrinkage targets for covariance matrix and its associated optimal shrinkage intensity. Bickel and Levina [6] discussed ‘‘universal’’ thresholding regularization for a sparse covariance matrix based on hard thresholding function. ‘‘Universal’’ means that a single threshold value is utilized to threshold all elements of the sample covariance matrix. They provide theoretical justification for convergence rate of estimated covariance to population covariance matrix asymptotically. The same idea is also discussed differently in [14]. It was later modified to include more general thresholding function [37] by introducing a class of thresholding functions relying on specific conditions, but it was still labeled as universal thresholding. Finally, Cai and Liu [10], presented an adaptive thresholding method to employ the thresholding idea on all elements of covariance matrix providing an entry dependent adaptive thresholding formulation. 2.3.1. Thresholding function In the original hard thresholding method of [6], a universal threshold value of k is being employed to threshold the forecast covariance matrix Pf , referring ‘‘Pf thresholded at level k’’ defined as below:

T HD k ðzÞ ¼ z1ðjzj P kÞ

ð12Þ

T HD k

Here is hard thresholding function, z is a variable that can be an array in the covariance matrix. Indicator function 1ðÞ takes zero if jzj < k or one otherwise; that is, if the absolute value of covariance array is less than the threshold value, the element will be set to zero in the covariance matrix. On the other hand, if the absolute value of the covariance element is bigger or equal to the threshold value, that element is kept untouched in the covariance matrix. Authors in [6] have, also, provided the rate of asymptotic convergence for a class of sparse covariance matrix under such a thresholding scheme. Later, Rothman et al. [37] extended this work to a set of more flexible thresholding functions ‘‘that combine thresholding with shrinkage’’ simultaneously and have ‘‘desirable feature not shared by hard thresholding’’ [37]. A generalized thresholding operator with specific conditions was defined to include thresholding, shrinkage and to limit the amount of shrinkage required. According to [37], different thresholding functions can be obtained as the solutions of quadratic loss problem with different penalty functions [3]. Based on solutions of quadratic loss problem, four various thresholding functions have been proposed in which hard function of [6] can also be retrieved. Soft thresholding is the solution of quadratic problem using lasso (least absolute shrinkage and selection operator) square penalty function:

T ST k ðzÞ ¼ sgnðzÞðjzj  kÞþ

ð13Þ

where, sgnðÞ is the well-known sign function and operator ðÞþ indicates truncation to zero value if the value within parentheses is negative. It is evident from Eqs. (13) and (12) that the soft function dictates a maximum amount of shrinkage whereas the hard one allows no shrinkage at all. However, Smoothly Clipped Absolute Deviation (SCAD) penalty can be used as a compromise between the soft and hard functions to control shrinkage:

8 jzj < 2k > < sgnðzÞðjzj  kÞþ T SD ðzÞ ¼ ðða  1Þz  sgnðzÞakÞ=ða  2Þ 2k 6 jzj < ak k > : z jzj P ak

ð14Þ

Parameter a is the key parameter to control the amount of shrinkage which is taken 3.7 in this work according to the suggestion of [3]. Finally, employing adaptive lasso in quadratic problem yields adaptive lasso thresholding function:

 gþ1 g  T LO jzj þ k ðzÞ ¼ sgnðzÞ jzj  k

ð15Þ

where g is a tuning parameter which determines the amount of shrinkage. Fig. 1 depicts the behavior of the thresholding functions with the designed parameters. As illustrated in Fig. 1, the hard and soft thresholding functions perform extremely on z while SCAD and lasso functions are sandwiched between them. 2.3.2. Computation of threshold value To perform the thresholding operation on the covariance matrix Pf with a thresholding function T k , it is required to obtain threshold value k. Minimizing following objective function with respect to k yields the thresholding value:

 2

k ¼ arg min E T k ðPf Þ  P

ð16Þ

where, EðÞ denotes the expected value of the corresponding variable and P is the true covariance matrix [44]. Due to the fact that true covariance matrix is not available; Eq. (16) is replaced by applying the average of v-fold cross validation [6]:

k ¼ arg min

Nv  2  1 X ^A  P ^B T k P v v Nv v ¼1 F

ð17Þ

In this equation, N v stands for v-fold cross validation and k  kF is Frobenius norm of a matrix. Although there are other forms for reducing Eq. (16) to Eq. (17), but Rothman et al. [37] argued that Frobenius norm ‘‘has slightly better performance’’ relative to others. Calculations involved in Eq. (17) consist of two main steps: in the first step, the sample data is divided randomly into two

4 Soft Lasso Scad Hard

3

T(z)

184

2

1

0

0

1

2 |z|

3

4

Fig. 1. Thresholding functions for k ¼ 0:8, a ¼ 2:8 and g ¼ 1, similar to Fig 1 in [37].

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

^A equal sets. From these sets, two estimated covariance matrices P v B ^ and Pv can be obtained. This procedure is, then, repeated N v times comprising v-fold cross validation. The average involved in Eq. (17) can be computed easily by estimated covariance matrices obtained from cross validation. In the second step, by utilizing estimated covariance matrices, the optimum value of k through minimization of Eq. (17) will be obtained. To perform the minimization, an easy algorithm is presented in [10]. After computing the threshold value, the covariance matrix Pf is then thresholded by corresponding thresholding function and value. We are going to refer this type of thresholding as universal covariance localization, UCT, due to the fact that a single threshold value k is applied to all entry of the covariance matrix Pf to perform thresholding. To provide different thresholding values, Cai and Liu [10] presented an adaptive thresholding approach in which every element of covariance matrix has its corresponding threshold value kij :

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi hij log n kij ¼ d N

ð18Þ

where hij is calculated from sample data set for every array (element) of covariance matrix Pfij as:

hij ¼

N  2 1X X ki X kj  Pfij N k¼1

ð19Þ

X is defined in Eq. (1). Parameter d can be obtained through the minimization of the following objective function utilizing v-fold cross validation as presented above:

d ¼ arg min

Nv  2  1 X ^A  P ^B T d P v v Nv v ¼1 F

ð20Þ

Unlike UCT, here thresholding is applied to all elements of covariance matrix with corresponding threshold value kij which we labeled it as Adaptive Covariance Thresholding, ACT. Obtaining the threshold value(s) k (kij ’s), the localized covariance matrix with thresholding can be calculated via both UCT Ploc ¼ T k ðPf Þ and ACT Ploc ¼ T kij ðPf Þ with their corresponding thresholding function. 2.3.3. Gain regularization with thresholding As discussed in Section 2.2, in some contributions, the so-called localization is applied directly to Kalman gain of Eq. (7) which we have referred this process as (gain) regularization to distinguish it from covariance localization. Zhang and Oliver compared covariance localization and Kalman gain regularization to each other [46]. They concluded that assimilation results obtained by employing Kalman gain regularization are subject to less error than those obtained from covariance localization. Based on their conclusion, authors in [44] justified their application of thresholding directly to Kalman gain regularization. In this work, following [44,46] we extended the previous discussion of adaptive covariance thresholding to gain regularization. To do so, it is just required to replace covariance matrix estimation in Eq. (17) or Eq. (20) with their corresponding Kalman gain: Nv  2  1 X ^A  K ^B k ¼ arg min T k K v v F Nv v ¼1

ð21Þ

and

d ¼ arg min

Nv  2  1 X ^A  K ^B T d K v v Nv v ¼1 F

ð22Þ

We are going to label Eqs. (21) and (22) as universal gain thresholding, UGT, and adaptive gain thresholding, AGT, respectively. In ^ is estimated directly from P ^ and the procedure these equations, K of applying thresholding to Kalman gain is exactly similar to those

185

used for covariance thresholding in the above section. To obtain hij for Kalman gain, first, it is required to perform a bootstrap resampling from ensemble data set. Kalman gain can then be estimated for every bootstrap sampled Kbi which provides statistical samples to acquire hij :

i2 b h 1 X Kbk ði; jÞ  Kf ði; jÞ Nb k¼1 N

hði; jÞ ¼

ð23Þ

Here, Kf is obtained by applying forecast covariance matrix Pf and N b is the number of bootstrap resampling attempt. Again by computing threshold value(s), the regularized Kalman gain using UGT, Kreg ¼ T k ðKf Þ; and AGT, Kreg ¼ T kij ðKf Þ, can be estimated with their corresponding threshold function. 2.3.4. Implementation algorithm In this section, we are going to wrap up all previous sections and represent a pseudo implementation algorithm including thresholding localization/regularization for subsurface media characterization. Although EnKF algorithm for subsurface parameter estimation is well developed (see for example in [48]) we bring it here concisely to include the adaptive thresholding in the pseudo-algorithm: Step 1: Generate N proper realizations from permeability or porosity field of subsurface media. Step 2: Propagate each realization according to the model dynamic of underlying flow system to the next assimilation process according to proper initial/boundary conditions. Step 3: By arriving new observation data, form the proper augmented state vectors of corresponding realizations. Due to the fact that in most subsurface flow systems, the respective states values appear in different order of magnitudes, a rescaling of state vector should be performed. To do so, we utilized a method presented in [28]. We define two rescaling matrix T1 and T2 of size n  n and d  d which are symmetric and diagonal. Their diagonal values are reciprocal of the corresponding order of magnitudes. With this rescaling, Kalman gain of Eq. (7) is:

    1   f fT T K ¼ T1 T2 T1 T1 HXf XfT HT þ R 1 T1 X X H 2 2 T2

ð24Þ

Defining C1 ¼ T1 Xf and C2 ¼ T2 HXf Kalman gain is reduced to:

K ¼ T1 1 KR T2

ð25Þ

In which KR is:

 1 KR ¼ C1 CT2 C2 CT2 þ T2 RT2

ð26Þ

Now, the thresholding of covariance or Kalman gain matrix is applied to rescaled ensemble set. Step 4: Perform minimization involved in Eq. (17) or Eq. (20) for localization of covariance matrix or Eq. (21) or Eq. (22) to regularize the Kalman gain. It is worth to point out that, augmented state vector consists of different state variables which forces the forecast covariance or Kalman gain matrix to be multi-structure [15]. By multi-structure, we mean that there are more than one type of model variables (e.g. permeability, saturation and pressure) in the state vector. This leads to a forecast covariance matrix with sub-matrix of different order of magnitude (see Eq. (7) in [15]). Our simulations showed that to acquire a better thresholding performance, every sub-structure is required to be thresholded separately.

186

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

Step 5: With obtained localized covariance or regularized gain, do update the ensemble members using Eq. (5) to Eq. (7). Step 6: If there is new observation go to step 2, otherwise the algorithm is finished. 3. Synthetic examples In this section, the approaches presented in the previous section are implemented to some different experimental setups for evaluation purposes. The first example is a 1D linear model (case 1) which can be solved analytically. The other cases are permeability estimation of two 2D reservoirs with oil – water flooding production scenario with different levels of heterogeneity and anisotropy. Case 2 is isotropic and less heterogeneous than case 3. In cases 2 and 3, aside from the thresholding approaches, the results of EnKF without localization, bootstrap EnKF and EnKF with distance dependent localization are also presented and compared against the results obtained with thresholding. 3.1. Case 1: 1D linear model The first example is a linear one dimensional correlated model (case 1) with a state vector consisting of n ¼ 100 state variables; x ¼ ½x1 ; x2 ; . . . ; x100 . The true state vector and ensemble realizations are generated via a multi-Gaussian covariance which is a function of distance between state variables [12,47]. The structure of covariance is composed as:

Cðji  jjÞ ¼

C1

0

0

C2

ð27Þ

where 0 is a zero matrix of size n=2  n=2 and:

C1 ðji  jjÞ ¼ ðcij Þ16i;j6n=2 ;

  1:5 cij ¼ exp 3ðji  jj=40Þ

ð28Þ

and

C2 ðji  jjÞ ¼ ðcij Þn=2þ16i;j6n ;

cij ¼ 4ð1  ji  jj=10Þþ

ð29Þ

Two non-local observations are made from the average of first 10 and 50 state values of true state vector:



d1 d2

¼

8 10 X > 1 > > xi > 10 < i¼1

ð30Þ

50 > X > > 1 > xi : 50 i¼1

The observations are assumed to have a Gaussian error distribution with zero mean and standard deviation of 0.05. The corresponding observation operator H matrix with size of 2  100 is populated with zeros except at 10 and 50 first arrays in the first and second rows whose values are 1/10 and 1/50 respectively. Because this problem is linear, true posterior covariance and Kalman gain can be solved analytically. The true values will be our reference to which the results of various adaptive methods are compared. To make comparison of methods more quantitatively, Frobenius norm of difference between estimated Kloc and true Kt Kalman gain matrix is calculated as e ¼ Kloc  Kt . This F

error is then computed over an average of 100 independent ensemble repetitions. The reason for repeating the experiment is to eliminate statistical sampling error as much as possible. This experiment is conducted as follows: For a specific ensemble size N, required realizations are generated employing Eq. (27). Now, it is possible to estimate forecast covariance Pf or its related Kalman gain. By employing a thresholding approach with a thresholding function, localization or regularization is then performed. Frobenius norm for corresponding error

is calculated. Finally, this process is repeated independently 100 times and then an average of the errors is computed and included in Table 1. Note that for this 1D case, we could not provide results for distance dependent localization, because the observations are non-local. For bootstrap regularization, results for somehow similar case – but simpler covariance structure – are available in [12] and they are not repeated here again. A first note to point out for Table 1 is about the star sign included there. Although this sign means that there is no answer available, it does not mean that corresponding approach/function cannot yield any reliable answer. Here, a similar experimental setup is applied for all approaches/functions to have comparable results. Due to the fact that covariance structure in Eq. (27) is not in same order for two regions C1 and C2 , a simple way to grant answers is to rescale all realizations into similar order of magnitudes before applying thresholding. However, we deliberately did not perform any specific pre-treatment to emphasize the importance of approach/function in localization/regularizations. Analyzing of Table 1 reveals that for small ensemble sizes, in general thresholding provides much less error relative to the no localization/regularization case (NL) except for universal gain thresholding. Adaptive thresholding of covariance/Kalman gain yields more accurate results to their universal counterparts. Meanwhile, thresholding of covariance matrix provides better results relative to gain regularization especially for small ensemble sizes. We think that this latter conclusion is somehow important. This is also verified in Sections 3.2 and 3.3. We will stress out this point a bit more in discussion section. More examination of Table 1 shows that results of the soft thresholding results is better in universal approaches while the lasso scheme provides more accurate results for adaptive cases. Nevertheless, the lasso results are similar to the SCAD and soft but still better. To have a feeling about estimation variability of various methods, Fig. 2 depicts a box-plot of Frobenius error for case N ¼ 100 of Table 1. From this figure, it is clear that the results of adaptive methods results are not only more accurate but also having much smaller variation and fewer outliers. It is also found that universal thresholding of covariance gives better results than universal gain approach. Estimated first column of Kalman gain by different methods/ functions of first observation d1 is illustrated in Fig. 3. In this figure, the true Kalman gain along with mean of 100 independent ensemble sets and mean plus/minus one standard deviation (STD) are depicted. The interesting note is that after 50th state variable, the true Kalman gain is zero. Amongst the applied methods, AGT can estimate this part of Fig. 3 relatively much better than others especially with the hard thresholding function. The main point, however, is that Kalman gain regularization produces biased estimation in almost all approaches/functions which is visible for UGT in Fig. 3. Aside from this point, gain regularization eliminates true Kalman gain value during elimination of spurious correlation as can be viewed in flat part of Fig. 3 around 20th state variable. This is visible for UGT and AGT in which gain values are forced toward zero’s by mentioned methods. This latter feature of gain regularization does not exist in covariance thresholding. Some biased estimation exists in universal thresholding of covariance with the hard function but is much better from its gain thresholding counterpart. Results of the universal gain thresholding results clearly indicate that Kalman gain matrix cannot be thresholded by a single threshold value as Table 1 emphasizes it. Due to this reason we henceforth drop UGT from further analysis and disregard it. It is important to point out that we deliberately made case 1 to emphasize the point that as the structure of covariance is more complex more sophisticated localization (here adaptive version) is required. In fact we have performed many simulations and found

187

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

Table 1 Average of 100 independent ensemble sets errors for different methods and thresholding functions, NL: no localization/regularization, HD: hard, ST; soft, LO: lasso, SD: SCAD, U: universal, C: covariance, T: thresholding, A: adaptive, G: gain, ⁄: no answer available. UCT

ACT

UGT

AGT

N

NL

HD

ST

LO

SD

HD

ST

LO

SD

HD

ST

LO

SD

HD

ST

LO

SD

25 50 100 300

6.8 4.6 3.2 1.9

















2.4 1.3

1.9 1.0

2.2 1.1

4.5 2.1 1.0

2.0 1.2 0.6

3.3 2.3 1.5 0.9

3.1 1.9 1.2 0.6

2.9 1.9 1.2 0.6

8.2 4.7 3.3 1.6

6.8 4.1 2.8 1.6

7.6 4.4 2.8 1.4

7.5 4.4 2.8 1.5

4.0 2.3 1.3 0.6

4.0 2.7 1.8 1.0

3.7 2.1 1.2 0.6

3.7 2.2 1.3 0.6

6

CT

4 2 0

No

UH

US

UL

UD

AH

AS

AL

AD

No

UH

US

UL

UD

AH

AS

AL

AD

6

GT

4 2 0

Fig. 2. (a) Box plot comparison for different methods/functions of case 1, No: no localization/regularization, U: universal, A: adaptive, H: hard, S: soft, L: lasso, D: scad, G: gain, C: covariance, T: thresholding.

UCT ACT

1 0.5 0 -0.5

UGT

1 0.5 0 -0.5

AGT

HD 1 0.5 0 -0.5

1 0.5 0 -0.5 20

60

ST

100

20

60

LO

100

20

60

SD

100

20

60

100

Fig. 3. Kalman gain estimated by methods/functions for first observation of case 1; STD: standard deviation, for other abbreviation see caption of Table 1.

that for simpler covariance structure (i.e. the order of magnitude of covariance entries are being similar) it is not worth to go under the cost of adaptive thresholding (examine [10]). Analyzing the simulation results indicates that UGT is somehow inferior to the other methods especially for small ensemble sizes which is of interest. Recall that the aim of any localization/regularization is to mitigate spurious correlation in covariance matrix for small ensemble sizes. We have, thus, concluded that gain matrix cannot be thresholded by using a universal approach (i.e. a single threshold value being employed to threshold Kalman gain entries), especially for more complicated cases as 2 or 3 which are presented subsequently.

3.2. Case 2: 2D oil–water flooding A two dimensional; two phase oil–water flooding reservoir was constructed for case 2. There are four production wells at four corners of the domain and a water injection well in the center of reservoir, Fig. 4. The reservoir is divided into 41  41  1 uniform grid blocks, each being 40  40  10 ft in size and holding 20% porosity. The true permeability of model-shown in Fig. 4 – and required ensemble set are generated with an isotropic Gaussian covariance with major and minor range of 8 grid blocks [21] employing sgsim module from SGeMS [36]. The prior log-permeability mean and variance are assumed to be 4 and 1, respectively.

188

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

7

40 P2

35

P4

6

30 5 25 Inj

20

4

15

3

10 2 5

P1

P3 10

20

30

40

1

Fig. 4. True log-permeability map of a Gaussian moderate heterogeneity.

Water is injected via injection well with rate of 800 STB/D from the beginning and was continued to the end of production period. All producers have constant bottomhole pressure being set at 3000 psi. Initial reservoir oil pressure and connate water saturation are 3000 psi and 0.2, respectively. Well patterns and specifications are summarized in Table 2. Production data consist of bottomhole pressure of injection well and oil–water production rates from producers (Fig. 5); totally 9 data for each assimilation cycle. Production data are obtained every 2 months for 2 years of reservoir production providing 13 assimilation cycles. To emulate noisy measurements, production data are perturbed with a white Gaussian noise with standard deviation of 5 psi and 4 STB/D for bottomhole pressure and oil production rate and 2% of obtained data for water production rate, respectively. Total production period is 6 years which is divided into two parts; data of first 2 years are employed for model characterization while the rest are maintained to evaluate performance and quality of localization/regularization results with them. The ith state vector of model 2 is concatenated as T xfi ¼ ½ln k; p; s; d . Here, ln k; p, s and d represent natural logarithm of permeabilities, oil pressures, water saturations and predicted observations, respectively. Our study shows that EnKF with 100 realizations can capture most features of the true reservoir without any localization for this case. To evaluate methods/functions performance, root mean square error (RMSE) is calculated as a difference between estimated parameter xe and the true value xt : a convergence of assimilation process will yield diminishing RMSE as assimilation proceeds along time, otherwise assimilation is diverged. It should be emphasized that RMSE is not an ‘‘ideal measure’’ for success of assimilation but it is a ‘‘practical tool’’ and it has been utilized in many researches for practical implementation of data assimilation (DA). The general idea is to perform DA as such that RMSE decreases monotonically meanwhile preserving the variance of the initial ensemble set as much as possible.

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N uX 2 RMSE ¼ t ðxei  xti Þ =N

ð31Þ

i¼1

Table 2 Well pattern used in the model of Section 3.2. Well

Inj

P1

P2

P3

P4

Well type x Location y Location Constraints

Injector 21 21 800 STB/D

Producer 5 5 3000 psi

Producer 5 36 3000 psi

Producer 36 5 3000 psi

Producer 36 36 3000 psi

Here, the parameter x can be any parameter in state vector. In this study, RMSE is calculated for log-permeability of subsurface media. The assimilation was performed with EnKF algorithm described in Section 2.3.4 for 50 realizations employing only adaptive thresholding with four thresholding functions of hard, soft, SCAD and lasso to investigate the thresholding properties more carefully. It should be pointed out that localization/regularization is carried out for all 13 assimilation cycles. The RMSE of estimated permeability along assimilation cycles is calculated for all methods/functions and depicted in Fig. 6. Examination of Fig. 6 shows that hard thresholding function performs over-thresholding on correlations both in covariance and Kalman gain for this case, while soft function provides a relatively more reliable estimate in adaptive schemes comparing to the universal one. The interesting point is that the trend of RMSE for different functions in Fig. 6 is preserved almost in all cases, except for the soft one in the universal scheme. Finally, the adaptive methods provide better results than universal ones. Both UCT and ACT have strong indications to assimilate conservatively for hard and lasso functions in ACT and all functions in UCT. This latter point clearly states that the thresholding of covariance preserves the structure of previous estimated permeability. This means that covariance thresholding updates the permeability field less frequently when thresholding is applied to all assimilation cycles. In AGT, however, updating is carried out almost in all cycles during gain thresholding. More study revealed the possibility of obtaining lower RMSE for an ensemble of size 100 where EnKF is employed without any localization (solid black line in Fig. 7). Hence, we concluded that the thresholding should not be applied to all assimilation cycles especially for early cycles. To explain latter point more clearly, the reader is referred to Fig. 7: Applying any localization/regularization in early assimilation cycles, removes more ‘‘correct’’ correlations alongside the spurious ones as evidenced by bootstrap EnKF (BR) and distance based EnKF (DL) in Fig. 7 for different ensemble sets. In addition, this early removal of correct correlations leads to a delayed convergence along with a higher RMSE at the end of assimilation for DL and BR. Anyway, as the desired point of each localization method is to mitigate under-sampling issues for small ensemble sizes, all methods carry out their points well enough as observable for case N ¼ 25 of Fig. 7. However, the early elimination of correct correlations can be seen by comparing EnKF with no localization case (NL) against BR and DL. With this important remark, we then changed the proposed algorithm a bit: thresholding will be performed when it is required, i.e. when measurements have enough information that assimilation does not deteriorate the previous updated field. To identify the localization/regularization necessity during assimilation, we simply carried out the thresholding when the current estimated RMSE is bigger than the latest one. This ensures to avoid correct correlations removal in early cycles to some extent and maintain the latest assimilated field to be updated less often. With this simple modification of algorithm, the case 2 was proceeded for assimilation by EnKF with thresholding of covariance and gain employing three ensemble sets of size 25, 50 and 100. To compare with thresholding approach; DL, BR and NL were also reexamined with these ensemble sets. DL approach was employed according to Eq. (13) to Eq. (15) of [15]. The critical correlation length utilized in Eq. (10) is 10 grid-blocks. To obtain the correlation matrix, the Euclidean distances between variables within gridblocks and the well locations were computed. BR implementation was quite straight forward as explained in Section 2.2. Number of bootstrap resampling was 400 and ra was considered as 0.4 in Eq. (11). For a thorough implementation of BR see [47]. After performing assimilation, the RMSE of permeability along cycles for all methods are calculated and illustrated in Fig. 7 in unique scale and format.

189

4200

4150

4100 0

5 production year

water production rate, STB/D

oil production rate, STB/D

Bottomhole pressure, Psi

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

200 150 100 50 0 0

5 production year

300 200 100 0 0

5 production year

Fig. 5. Solid lines: true production data, circles: noisy observations; solid vertical line yields two part: left used for assimilation and right utilized for prediction analysis. Different color shows different production well data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

UCT

ACT HD

RMSE

1.3

ST

AGT LO

SD

1.26

1.22

2

6 10 assimilation cycle

2

6 10 assimilation cycle

2

6 10 assimilation cycle

Fig. 6. RMSE for methods/functions for model 2 applying thresholding in all cycles; U: universal, A: adaptive, T: thresholding, C: covariance, G: gain, HD: hard, ST: soft, LO: lasso, SD: SCAD.

N = 25

UCT

ACT

AGT

1.2

N = 50

1.0

1.2

HD

N = 100

1.0

ST

LO

SD

BR

NL

DL

1.2

1.0 2

6 10 assimilation cycle

14

2

6 10 assimilation cycle

14

2

6 10 assimilation cycle

14

Fig. 7. RMSE of thresholding methods/functions for different ensemble sizes 25, 50, 100 of model 2, to compare EnKF with no localization (NL), EnKF with bootstrap gain regularization (BR) and EnKF with distance dependent localization (DL) are also presented. HD: hard, ST: soft, LO: lasso, SD: SCAD; thresholding is not applied to all assimilation cycles.

For the case where N ¼ 25; EnKF without any localization (labeled as NL) is oscillating after first assimilation and then diverged after 7th step. NL underestimation/overestimation of permeability can be viewed in final estimated permeability map (after 13th assimilation) in lower right corner of Fig. 8. For this case, all

localization/regularization methods are converged meanwhile DL has higher RMSE than others and adaptive methods provides much smaller RMSE’s. Another interesting point is that in modified algorithm, all thresholding functions have approximately similar trend especially for ensembles with size 50 and 100. For this case, the

190

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

true permeability map can be characterized mostly by NL with N ¼ 100 which can be seen in Fig. 7. Comparing the results of other methods with NL for this case shows ‘‘correct correlations elimination’’ more notable by DL and BR. This elimination causes these methods to end up with higher RMSE’s than thresholding approaches in Fig. 7. After final assimilation performed, the estimated and true permeability map for ensembles of size 25 and 100 are plotted in Figs. 8 and 9, respectively. Due to the fact that all thresholding functions have similar behavior, results of the soft function are presented in these figures. A notable point from these figures is the smoothness of estimated map for DL which is more visible in Fig. 9. It should be noted that to examine the robustness of thresholding methods, five different ensemble sets with size 25, 50 and 100 have been generated. These sets were different from those being utilized in the present sub-section but follow the same generation procedure as case 2. These ensemble sets were used for ‘‘true’’ reservoir characterization afterwards. Irrespective of quantitative difference in the results, they followed the same trend as seen in Figs. 7–9. These results and associated figures are not shown here, due to the space limitation. In this case we studied thresholding approaches/functions along with NL, DL and BR. Although thresholding results were better in general we, however, could not get a general answer for the question of which thresholding scheme would outperform the others. For this reason, we constructed a more complex twin experiment (case 3) which is presented in Section 3.3. 3.3. Case 3: an anisotropic 2D reservoir In this case, we repeated experiment of previous section but with different geological structure; with higher heterogeneity and longer assimilation period than those of case 2. The true permeability (Fig. 10) and ensemble sets are generated with an anisotropic spherical covariance [21] with major correlation range of 20 grid blocks oriented 45° from x-axis and minor correlation range of 8 grid blocks again by sgsim of SGeMS [36].

Water is injected with constant rate of 500 STB/D to have a longer production period. Production types are similar to case 2 and are obtained every 2 months for 3 years of reservoir production life providing 19 assimilation cycles. First cycle uses the data of third day of production. In addition, production data are perturbed with a higher noise than that of case 2. Here, data are corrupted by a white Gaussian noise with standard deviation of 8 psi and 4 STB/D for bottomhole pressure and oil–water production rates, respectively (Fig. 11). Other required data are the same as case 2, such as mean and variance of log-permeability of field. EnKF was initialized by three different ensemble sets with 25, 50 and 100 realizations and assimilation was carried out with modified algorithm presented in previous section employing all thresholding methods/functions along with BR, DL and NL methods. Here, BR setting is similar to case 2 but for DL, the critical length is increased to 18 grid-blocks. Calculated RMSE of estimated permeability for all assimilation cycles are displayed in Fig. 12. Due to the much more nonlinearity–complexity embedded in this model, EnKF without localization is diverged almost in all ensemble sets and this is, also, true for BR and DL. Here, the interesting point is that thresholding methods did not allow updating of estimated field in some cycles to prevent degrading of previous assimilation results. It is also note worthy to mention that such a prohibition acts albeit like a bias, but improves the accuracy of assimilated results. For covariance thresholding cases (ACT and UCT), all thresholding functions perform almost similarly while in AGT, the hard and lasso functions show some tendency to diverge at later assimilation times. Amongst the applied methods, ACT provides the smallest RMSE at the end of assimilation cycles in all ensemble sets. Another notable point is that for 25-member ensemble set, after initial permeability updating, no other updating was allowed in all thresholding methods except for ACT case. We can relate this phenomenon to the fact that minimization of Eq. (17)/(20) or Eq. (21)/(22) yields a value(s) that the thresholded covariance or Kalman gain matrix removes almost all correlations (either true or false) from these matrices. This is a conservative updating of permeability field

UCT

ACT

AGT

BR

DL

NL

True

Fig. 8. Estimated mean of log-permeability for soft function and other methods along with true map of model 2 for N = 25.

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

UCT

ACT

AGT

BR

DL

NL

191

True

Fig. 9. Estimated mean of log-permeability for different methods/soft function and true map of model 2 for N = 100.

35

P2

P4

6

30 5 25 Inj

20

4

15

3

10 2 5

P1

P3 10

20

30

40

1

Fig. 10. True log – permeability of model 3 with location of injection/production wells.

4000 3900 3800 3700 3600 0

5 10 production year

oil production rate, STB/D

Bottomhole pressure, Psi

from one assimilation step to another, trying to preserve the latest estimation untouched as possible. Figs. 13 and 14 show final estimated permeability map for different methods employing 25 and 100 realization sets,

respectively. The maps are in the same scale as true permeability map (the most left one in Figs. 13 and 14). The results of thresholding methods are presented for the soft function. It is evident that there is severe over/underestimation of permeability for BR, DL and NL. On the other hand, the thresholding results are more similar to the true map with much less over/underestimation even for the smallest case (N ¼ 25). Finally, as for the case 2, five different ensemble sets with size 25, 50 and 100 were generated to investigate the robustness of the thresholding methods. The obtained results indicated similar trend to those presented here. After the data assimilation process was finished (case N ¼ 25), final estimated permeability maps are initialized in an oil–water flow simulator to evaluate the forecasting ability of different methods. Simulation was carried out from time zero to the end of production period (10 years). Obtained production profiles of producer 3 (P3) are plotted in Figs. 15 and 16 for oil and water rates, respectively. We selected P3 to show its production profiles due to the fact that no water rate of this well was utilized in assimilation period. This makes assimilation and prediction for well P3 quite challenging. The matched and predicted bottomhole pressure of injection well is, also, depicted in Fig. 17. It is well known that the inverse problems with huge degrees of freedom cannot be solved uniquely and exactly. However, according to the time series

150 100 50 0 0

5 10 production year

Fig. 11. As in Fig. 5 but for model 3.

water production rate, STB/D

7

40

200 150 100 50 0 0

5 10 production year

192

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

UCT

ACT

AGT

N = 25

1.6 1.4 1.2 1

N = 50

1.6 1.4 1.2 HD

N = 100

1

ST

LO

SD

BR

NL

DL

1.6 1.4 1.2 1

5

10 15 assimilation cycle

20

5

10 15 assimilation cycle

20

5

10 15 assimilation cycle

20

Fig. 12. RMSE of thresholding methods/functions for different ensemble sizes of model 3, see Fig. 7 for abbreviations.

UCT

ACT

AGT

BR

DL

NL

True

Fig. 13. Estimated mean of log-permeability for different methods and true map of model 3 for N = 25.

in Fig. 16, all methods except BR and NL provide acceptable justification relative to ‘‘initial forecasting profile’’ (leftmost subfigure). In addition, this point is also valid for Fig. 17, albeit with some reservations. It should be pointed out that in these figures, ‘‘mean of ensemble set’’ shows production profile corresponding to mean of all updated permeability realizations and not the mean of production profiles. The results of soft thresholding function are shown for thresholding case. Analyzing Figs. 15–17 reveals that BR and NL ensemble sets collapsed to almost a single realization as there is relatively no variability in the forecast production data. Meanwhile thresholding approaches along with DL provide much better variability in the production data.

4. Discussion The first necessary point to be mentioned is the sparsity hypothesis assumed for covariance and Kalman gain matrices. This assumption is required for thresholding theories [6,10,37]. It is believed that covariance and Kalman gain matrices for subsurface media are not truly sparse. In order to justify the validity of sparsity assumption to some extent, it is required to consider the fact that as production proceeds, various different areas of reservoir feel the production stimulus in different times. This point is quite evident from streamline localization papers [5,13,45]. In these contributions, a localization matrix is computed to localize covariance matrix by devoting different weights to entries of covariance. This

193

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

UCT

ACT

AGT

BR

DL

NL

True

Fig. 14. Estimated mean of log-permeability for different methods and true map of model 3 for N = 100.

UCT

ACT

AGT

BR

DL

NL

oil rate, STB/D

100 Initial 50

0 0

5 production year

10

Fig. 15. Matched/predicted oil rate of well P3 before/after vertical line for model 3 with 25 realizations; oil rate for reference field (red circle), rate of assimilated realizations (light blue solid lines) and mean of realizations (magenta star), Initial: production profiles of ensemble set without any assimilation. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

is based on the fact that correspondent streamlines or phase streamlines have passed in certain grid block of the medium. If streamline localization is inspected form this point of view (for example Figs. 12, 16 and 17 in [13]) it clearly shows that most part of the reservoir are unaffected by a specific observation of a specific well. This shows that there are grid-blocks in the medium in which the correspondent weights for localization matrix is zero providing a sparse covariance structure after streamline localization. On the other hand, it is possible to obtain good results from thresholding matrices ‘‘which are not sparse per se but have many coefficients close to zero’’ [14]. El Karoui [14] provides two propositions stressing out for situations that ‘‘estimating the (not necessarily sparse) population covariance by thresholding the sample

covariance matrix will lead to good results’’. In an ad hoc manner we can, also, attribute this to Kalman gain. Another important issue to be reemphasized is that ‘‘localization/regularization with thresholding’’ has the risk of simultaneous elimination of true and spurious correlations. Any treatment of the covariance and Kalman gain matrices not only changes the structure of these matrices but also removes the true correlations when applying thresholded localization/regularization. This point is discussed in [12] by stating, ‘‘one problem with any localization method is that it tends to eliminate some weak, but real, correlations along with the spurious correlations’’. It seems that performing thresholded localization/regularization in all assimilation cycles, especially for more complex system, forces the ensemble

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

water rate, STB/D

194

UCT

ACT

AGT

BR

DL

NL

100 Initial 50

0 0

5 production year

10

injection pressure, psi

Fig. 16. Matched/predicted water rate of well P3 before/after vertical line for model 3 with 25 realizations; water rate for reference field (red circle), rate of assimilated realizations (light blue solid lines), mean of realizations (magenta star), Initial: production profiles of ensemble set without any assimilation. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

UCT

ACT

AGT

BR

DL

NL

4000 Initial 3900 3800 3700 3600 0

2

4 6 production year

8

10

Fig. 17. Matched/predicted bottomhole pressure of injection well before/after vertical line for model 3 with 25 realizations; water rate for reference field (red circle), rate of assimilated realizations (light blue solid line), mean of realizations (magenta star), Initial: production profiles of ensemble set without any assimilation. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

set to lose the track of true model by elimination of true correlations from ensemble set. Based on these notes, we modified algorithm presented in Section 2.3.4 utilizing RMSE. We, however, are aware of the limitations involved in the application of RMSE. The main point is not the modified algorithm itself, but the importance of selective localization/regularization. Selective localization procedure and criteria seem to be interesting and require more research. It is evident that application of DL and also BR is quite straightforward. We do believe that correct estimation of critical length used in DL improves DL [15], but application of DL to non-local observation is questionable. On the other hand, we think that BR may lead to a biased estimation of Kalman gain in some applications as presented in [12]. It is discussed in [30] that ‘‘quality of the EnKF estimates may not be improved by simply using more frequent assimilation

(whenever measurements are available)’’ and this is ‘‘regardless of the specific sampling strategy’’ as evidenced in Figs. 7 and 12. This is a strong indication to avoid naive assimilation of all coming observations into the ensemble set. In other words, not all measurements are equally welcomed during the data integration. We think that our modified algorithm for adaptive thresholding, maybe, help to evade frequent data assimilation into realizations. Adaptive thresholding by providing high threshold value in high RMSE situation removes most of non-zero correlations. This allows only a conservative updating of latest assimilated field, assisting to avoid ‘‘over-assimilation’’ of ensemble set to some extent. This is visible from constant RMSE sections in Fig. 12. As it is said before, both the SCAD and lasso thresholding functions act as a compromise between thresholding and shrinkage. Results obtained from these two functions, especially SCAD one, is better from all others. These functions have a tuning parameter

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

which can be tuned for specific problems to provide more accurate results. It should be noted that we did not perform such a tuning in the present contribution. The final issue is the work of [44] that employed the hard thresholding function adaptively to remove spurious correlation from Kalman gain. Although this work is similar to the thresholding idea of [44], but we have used a different algorithm to obtain the threshold value and implement universal/adaptive thresholding idea. The number of objective function evaluations (Eq. (17)/(20) or Eq. (21)/(22)) for minimization is much less than that of a similar task in that work. More precisely speaking, the number of objective function evaluations at their work is as much as two times of Kalman gain size (2  n  d) in each assimilation cycle while in our work it is as small as 30–40 evaluations. 5. Summary and conclusion In this work we employed the adaptive thresholding to mitigate under-sampling issues in ensemble Kalman Filter to demonstrate its applicability for better estimation of geological features. At first, the application of adaptive thresholding on the forecast covariance matrix was presented and then extended to the Kalman gain for direct gain regularization. The proposed adaptive thresholding method can be employed for both local and non-local observations coherently. Adaptive thresholding approaches along with four different thresholding functions were applied to three cases with different levels of complexity/heterogeneity. The results of EnKF with adaptive thresholding, then, were compared against those of the original EnKF without any treatment. The same comparison was done for bootstrap EnKF and EnKF with distance based localization. Our results showed that the covariance localization yields better results than Kalman gain regularization especially for small ensemble sizes with adaptive thresholding. It was emphasized that thresholding of covariance or Kalman gain should be performed wisely during assimilation cycles. Results of adaptive thresholding obtained from subsurface characterization were promising and better than those of other localization methods. References [1] Aanonsen SI, Nœvdal G, Oliver DS, Reynolds AC, Vallès B. The ensemble Kalman filter in reservoir engineering-a review. SPE J 2009;14:393–412. http:// dx.doi.org/10.2118/117274-PA. [2] Anderson JL. Exploring the need for localization in ensemble data assimilation using a hierarchical ensemble filter. Phys D: Nonlinear Phenom 2007;230:99–111. http://dx.doi.org/10.1016/j.physd.2006.02.011. [3] Antoniadis A, Fan J. Regularization of wavelet approximations. J Am Stat Assoc 2001;96:939–55. http://dx.doi.org/10.1.1.8.6694. [4] Doucet Arnaud, de Freitas Nando, Gordon Neil. Sequential Monte Carlo methods in practice. New York, USA: Springer Publication; 2001. [5] Arroyo-Negrete E, Devegowda D, Datta-Gupta A, Choe J. Streamline-assisted ensemble Kalman filter for rapid and continuous reservoir model updating. SPE Reservoir Eval Eng 2008;11:1046–60. http://dx.doi.org/10.2118/104255PA. [6] Bickel PJ, Levina E. Covariance regularization by thresholding. Ann Stat 2008;36:2577–604. http://dx.doi.org/10.1.1.142.6677. [7] Bishop CH, Hodyss D. Adaptive ensemble covariance localization in ensemble 4D-VAR state estimation. Mon Weather Rev 2011;139:1241–55. http:// dx.doi.org/10.1175/2010MWR3403.1. [8] Bishop CH, Hodyss D. Ensemble covariances adaptively localized with ECORAP Part 1: tests on simple error models. Tellus Ser A: Dyn Meteorol Oceanogr 2009;61:84–96. http://dx.doi.org/10.1111/j.1600-0870.2008.00371.x. [9] Burgers G, Van Leeuwen PJ, Evensen G. Analysis scheme in the ensemble Kalman filter. Mon Weather Rev 1998;126:1719–24. http://dx.doi.org/ 10.1175/1520-0493(1998)126<1719:ASITEK>2.0.CO;2. [10] Cai T, Liu W. Adaptive thresholding for sparse covariance matrix estimation. J Am Stat Assoc 2011;106:672–84. [11] Chen Y, Oliver DS. Cross-covariances and localization for EnKF in multiphase flow data assimilation. Comput Geosci 2010;14:579–601. http://dx.doi.org/ 10.1198/jasa.2011.tm10560. [12] Chen Y, Oliver DS. Multiscale parameterization with adaptive regularization for improved assimilation of nonlocal observation. Water Resour Res 2012;48. http://dx.doi.org/10.1029/2011WR011144.

195

[13] Devegowda D, Arroyo-Negrete E, Datta-Gupta A. Flow relevant covariance localization during dynamic data assimilation using EnKF. Adv Water Resour 2010;33:129–45. http://dx.doi.org/10.1016/j.advwatres.2009.10.001. [14] El Karoui N. Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann Stat 2008;36:2717–56. http://dx.doi.org/10.1214/ 07-AOS559. [15] Emerick A, Reynolds A. Combining sensitivities and prior information for covariance localization in the ensemble Kalman filter for petroleum reservoir applications. Comput Geosci 2011;15:251–69. http://dx.doi.org/10.1007/ s10596-010-9198-y. [16] Evensen G. Data assimilation: the ensemble Kalman filter. Berlin Heidelberg: Springer-Verlag; 2009. [17] Evensen G. The ensemble Kalman filter for combined state and parameter estimation. IEEE Control Syst Mag 2009;29:83–104. http://dx.doi.org/10.1109/ MCS.2009.932223. [18] Evensen G. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J Geophys Res A: Space Phys 1994;99:10143–62. http://dx.doi.org/10.1029/94JC00572. [19] Furrer R, Bengtsson T. Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J Multivariate Anal 2007;98:227–55. http://dx.doi.org/10.1016/j.jmva.2006.08.003. [20] Gaspari G, Cohn SE. Construction of correlation functions in two and three dimensions. Q J R Meteorol Soc 1999;125:723–57. http://dx.doi.org/10.1002/ qj.49712555417. [21] Goovaerts P. Geostatistics for natural resources evaluation. USA: Oxford University Press; 1997. [22] Hamill TM, Whitaker JS, Snyder C. Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon Weather Rev 2001;129:2776–90. http://dx.doi.org/10.1175/1520-0493(2001)129< 2776:DDFOBE>2.0.CO;2. [23] Houtekamer PL, Mitchell HL. Data assimilation using an ensemble Kalman filter technique. Mon Weather Rev 1998;126:796–811. http://dx.doi.org/ 10.1175/1520-0493(1998)126<0796:DAUAEK>2.0.CO;2. [24] Houtekamer PL, Mitchell HL. A sequential ensemble Kalman filter for atmospheric data assimilation. Mon Weather Rev 2001;129:123–37. http:// dx.doi.org/10.1175/1520-0493(2001)129<0123:ASEKFF>2.0.CO;2. [25] Janjic´ T, Nerger L, Albertella A, SchröTer J, Skachko S. On domain localization in ensemble-based Kalman filter algorithms. Mon Weather Rev 2011;139:2046–60. http://dx.doi.org/10.1175/2011MWR3552.1. [26] Carrera J, Alcolea A, Medina A, Hidalgo J, Slooten LJ. Inverse problem in hydrogeology. Hydrogeol J 2005;13:206–22. http://dx.doi.org/10.1007/ s10040-004-0404-7. [27] Kalman RE. A new approach to linear filtering and prediction problem. Trans ASME Ser D J Basic Eng 1960;82:34–45. [28] Krymskaya MV, Hanea RG, Verlaan M. An iterative ensemble Kalman filter for reservoir engineering applications. Comput Geosci 2009;13:235–44. http:// dx.doi.org/10.1007/s10596-008-9087-9. [29] Ledoit O, Wolf M. A well-conditioned estimator for large-dimensional covariance matrices. J Multivariate Anal 2004;88:365–411. http://dx.doi.org/ 10.1016/S0047-259X(03)00096-4. [30] Li J, Xiu D. On numerical properties of the ensemble Kalman filter for data assimilation. Comput Methods Appl Mech Eng 2008;197:3574–83. http:// dx.doi.org/10.1016/j.cma.2008.03.022. [31] Lorenc AC. The potential of the ensemble Kalman filter for NWP – a comparison with 4D-Var. Q J R Meteorol Soc 2003;129:3183–203. http:// dx.doi.org/10.1256/qj.02.132. [32] Oke PR, Sakov P, Corney SP. Impacts of localisation in the EnKF and EnOI: experiments with a small model. Ocean Dyn 2007;57:32–45. http://dx.doi.org/ 10.1007/s10236-006-0088-8. [33] Oliver DS, Chen Y. Recent progress on reservoir history matching: a review. Comput Geosci 2011;15:185–221. http://dx.doi.org/10.1007/s10596-0109194-2. [34] Pham DT. Stochastic methods for sequential data assimilation in strongly nonlinear systems. Mon Weather Rev 2001;129:1194–207. http://dx.doi.org/ 10.1175/1520-0493(2001)129<1194:SMFSDA>2.0.CO;2. [35] Pourahmadi M. High-dimensional covariance estimation: with highdimensional data. Wiley; 2013. [36] Remy N, Boucher A, Wu J. Applied geostatistics with SGeMS: a user’s guide. Cambridge University Press; 2009. [37] Rothman AJ, Levina E, Zhu J. Generalized thresholding of large covariance matrices. J Am Stat Assoc 2009;104. http://dx.doi.org/10.1198/jasa.2009.0101. [38] Sakov P, Bertino L. Relation between two common localisation methods for the EnKF. Comput Geosci 2011;15:225–37. http://dx.doi.org/10.1007/s10596010-9202-6. [39] Sakov P, Oke PR. A deterministic formulation of the ensemble Kalman filter: an alternative to ensemble square root filters. Tellus Ser A: Dyn Meteorol Oceanogr 2008;60A:361–71. http://dx.doi.org/10.1111/j.1600-0870.2007.00299.x. [40] Schäfer J, Strimmer K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 2005;4:1–30. http://dx.doi.org/10.1.1.165.3620. [41] Stordal AS, Karlsen HA, Nævdal G, Oliver DS, Skaug HJ. Filtering with state space localized Kalman gain. Phys D: Nonlinear Phenom 2012;241:1123–35. http://dx.doi.org/10.1016/j.physd.2012.03.006. [42] Sun AY, Morris A, Mohanty S. Comparison of deterministic ensemble Kalman filters for assimilating hydrogeological data. Adv Water Resour 2009;32:280–92. http://dx.doi.org/10.1016/j.advwatres.2008.11.006.

196

E.B. Delijani et al. / Advances in Water Resources 69 (2014) 181–196

[43] Lefebvre T, Bruyninck H, Schuttera JD. Kalman filters for non-linear systems: a comparison of performance. Int J Control 2004;77:639–53. http://dx.doi.org/ 10.1080/00207170410001704998. [44] Tong Y, Zhang Y, Oliver DS. Adaptive regularization in the ensemble Kalman filter for reservoir history matching. In: SPE reservoir simulation symposium Texas, USA; 2011. pp. 188–202. http://dx.doi.org/10.2118/ 141059-MS. [45] Watanabe S, Datta-Gupta A. Use of phase streamlines for covariance localization in ensemble Kalman filter for three-phase history matching. SPE Reservoir Eval Eng 2012;15:273–89. http://dx.doi.org/10.2118/144579-PA.

[46] Zhang Y, Oliver DS. Evaluation and error analysis: Kalman gain regularization versus covariance regularization. Comput Geosci 2011;15:489–508. http:// dx.doi.org/10.1007/s10596-010-9218-y. [47] Zhang Y, Oliver DS. Improving the ensemble estimate of the Kalman gain by bootstrap sampling. Math Geosci 2010;42:327–45. http://dx.doi.org/10.1007/ s11004-010-9267-8. [48] Zhou H, Gómez-Hernández JJ, Hendricks Franssen H-J, Li L. An approach to handling non-Gaussianity of parameters and state variables in ensemble Kalman filtering. Adv Water Resour 2011;34:844–64. http://dx.doi.org/ 10.1016/j.advwatres.2011.04.014.