Automatic and adaptive localization for ensemble-based history matching

Automatic and adaptive localization for ensemble-based history matching

Journal of Petroleum Science and Engineering xxx (xxxx) xxx Contents lists available at ScienceDirect Journal of Petroleum Science and Engineering j...

4MB Sizes 0 Downloads 61 Views

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

Contents lists available at ScienceDirect

Journal of Petroleum Science and Engineering journal homepage: http://www.elsevier.com/locate/petrol

Automatic and adaptive localization for ensemble-based history matching Xiaodong Luo *, Tuhin Bhakta Norwegian Research Centre (NORCE), Norway

A R T I C L E I N F O

A B S T R A C T

Keywords: Ensemble data assimilation Iterative ensemble smoother Correlation-based adaptive localization Seismic history matching Big data assimilation

Ensemble-based history matching methods are among the state-of-the-art approaches to reservoir character­ ization. In practice, however, they often suffer from ensemble collapse, a phenomenon that deteriorates history matching performance. To prevent ensemble collapse, it is customary to equip an ensemble history matching algorithm with a certain localization scheme. In a previous study, the authors propose an adaptive localization scheme that exploits the correlations between model variables and simulated observations. Correlation-based adaptive localization not only overcomes some longstanding issues (e.g., the challenges in handling non-local or time-lapse observations) arising in the con­ ventional distance-based localization, but also is more convenient and flexible to use in real field case studies. In applications, however, correlation-based localization is also found to be subject to two problems. One is that, it requires to run a relatively large ensemble in order to achieve decent performance in an automatic manner, which becomes computationally expensive in large-scale problems. As a result, certain empirical tuning factors are introduced to reduce the computational costs. The other problem is that, the way used to compute the tapering coefficients in the previous study may induce discontinuities, and neglect the information of certain still-influential observations for model updates. The main objective of this work is to improve the efficiency and accuracy of correlation-based adaptive localization, making it run in an automatic manner but without incurring substantial extra computational costs. To this end, we introduce two enhancements that aim to avoid the aforementioned two problems, namely, empirical tuning and discontinuities. We apply the resulting automatic and adaptive correlation-based locali­ zation with these two enhancements to a 2D and a 3D cases investigated in the previous study, and show that it leads to better history matching performance (in terms of efficiency and/or estimation accuracy) than that is achieved in the previous work.

1. Introduction Ensemble-based methods (Aanonsen et al., 2009; Evensen, 2009) are now among the state-of-the-art approaches to history matching prob­ lems. In practice, due to the limitations of available computational re­ sources, typically one can only afford to run an ensemble of reservoir models whose size is significantly smaller than the dimensions of both the reservoir models and the observational data. As the driving idea behind ensemble-based methods is to exploit covariance and cross-covariance in the Kalman-gain-like matrix for model update, when one uses a relatively small ensemble of reservoir models in history matching, there will be substantial sampling errors in the estimated statistics. As a result, it is not uncommon to observe that a plain ensemble-based method renders unsatisfactory history matching performance.

An under-performing ensemble-based method is often associated with a well-noticed phenomenon, known as ensemble collapse. When ensemble collapse takes place, the variations between realizations become so tiny that reservoir models almost collapse into a single one. To prevent ensemble collapse, it is customary to equip an ensemble method with an auxiliary technique, called localization (Chen and Oliver, 2010; Emerick and Reynolds, 2011; Hamill et al., 2001; Luo et al., 2018a). Typically, localization aims to mitigate certain noticed detrimental effects, namely, rank deficiency and/or sampling errors, when using a relatively small sample size in an ensemble-based method (Anderson, 2012; Hamill et al., 2009). Because of its importance in real applications, there have been immense efforts dedicated to the research and de­ velopments of various localization schemes (see, e.g., Anderson, 2012; Anderson, 2016; Arroyo et al., 2008; Chen and Oliver, 2010; Emerick

* Corresponding author. E-mail address: [email protected] (X. Luo). https://doi.org/10.1016/j.petrol.2019.106559 Received 25 March 2019; Received in revised form 28 June 2019; Accepted 3 October 2019 Available online 9 October 2019 0920-4105/© 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Please cite this article as: Xiaodong Luo, Tuhin Bhakta, Journal of Petroleum Science and Engineering, https://doi.org/10.1016/j.petrol.2019.106559

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

and Reynolds, 2011; Hamill et al., 2001; Houtekamer and Mitchell, 1998; Huang and Zeng, 2016; Lacerda et al., 2019; Luo et al., 2018a; Zhang and Oliver, 2010), whereas providing a thorough review of different localization methods is beyond the scope of the current work. Among various developed localization schemes, so far distancebased localization (Chen and Oliver, 2010, 2017; Emerick and Rey­ nolds, 2011; Raniolo et al., 2013) appears to be the most adopted one for reservoir characterization problems. While working well in many problems, conventional distance-based localization methods suffer from some noticed issues in certain circumstances (Luo et al., 2018a, 2019), including, for instance.

reduce the computational costs. The other problem is that, the way used to compute the tapering coefficients in (Luo et al., 2018a, 2019) may induce discontinuities, and neglect the information of certain potentially influential observations for model updates. These two noticed problems will be further elaborated later. The main objective of this work is to improve the efficiency and accuracy of correlation-based adaptive localization in (Luo et al., 2018a), making it run in an automatic manner, but without incurring substantial extra computational costs. To this end, we propose the following two enhancements to tackle the aforementioned two problems. In the first enhancement, we introduce an automatic and efficient workflow to properly estimate the noise levels in sample correlation fields, which is an essential step to correlation-based adaptive localiza­ tion. As will be shown later, this workflow does not need to run a relatively large ensemble as in the Ideal approach of (Luo et al., 2018a), while avoiding empirical tunings as in the Practical approach therein. In the second enhancement, we consider an alternative tapering rule for the calculation of tapering coefficients in localization. In (Luo et al., 2018a, 2019), a hard-thresholding strategy is adopted to compute the tapering coefficients. In the current work, we propose an alternative tapering rule, which leads to smoother tapering-coefficients, and tends to result in better history matching performance. This work is organized as follows. We first introduce the practical implementation of correlation-based localization in (Luo et al., 2018a, 2019), and identify two problems arising in practice. We then introduce two enhancements that are proposed to tackle the noticed problems, and inspect the impacts of these two enhancements through some numerical investigations. Next, we proceed to demonstrate the performance of correlation-based localization with these two enhancements, and compare it with the performance resulting from a more expensive approach in (Luo et al., 2018a). Finally, we conclude the whole work with some discussions.

I1: the efficiency in dealing with big datasets; I2: the difficulty in handling non-local observations; I3: the dependence on the availabilities of physical locations of both observations and model variables; I4: the non-adaptivity to time-lapse observations and different types of model variables; I5: case-dependent implementations and the transferability of implemented localization schemes among different case studies, For convenience of discussion later, each issue is associated with a label (e.g., I1, I2, …etc). A full account of when/why these issues may arise is omitted here for succinctness. Readers are referred to (Luo et al., 2018a, 2019) for more elaborations. In the literature, there have been certain types of adaptive localiza­ tion schemes, e.g., (Anderson, 2012, 2016), which can be employed to tackle at least some of the aforementioned issues. In order to achieve reasonable performance, these adaptive schemes typically require to run a relatively large ensemble of reservoir models, and may thus become more expensive in large-scale problems. In this respect, a computa­ tionally more efficient adaptive localization scheme is recently devel­ oped in (Evensen, 2009; Luo et al., 2018a), where the authors apply hard-thresholding (keep-or-kill) to the correlation coefficients between model variables and the corresponding simulated observations, as a means to conduct data selection for model updates, similar to the idea behind local analysis (Houtekamer and Mitchell, 1998). The main pro­ cedures of correlation-based adaptive localization in (Luo et al., 2018a) partially overlap those used in Chapter 15 of (Evensen, 2009), whereas (Luo et al., 2018a) additionally proposes approaches to the choices of threshold values. The initial motivation of developing correlation-based localization in (Luo et al., 2018a) was to tackle the issue of lack of physical locations of the seismic attributes in a seismic history matching (SHM) workflow (Luo et al., 2017, 2018b). This issue arises because, in order to simul­ taneously reduce data size and quantify data uncertainties, the authors choose to apply discrete wavelet transforms to the conventional seismic data, and use the resulting leading wavelet coefficients (without phys­ ical locations) as the observations in SHM. In subsequent investigations, it was realized that this (hard-thresholding) correlation-based localiza­ tion scheme can not only address issue I3 (lack of physical locations), but also help to mitigate or overcome the other aforementioned issues (Luo et al., 2018a, 2019). A comparison between conventional distance-based localization and correlation-based localization is also conducted in a real field case study (Luo et al., 2019), where it is shown that correlation-based localization achieves close or better history matching performance in terms of data mismatch, while being practi­ cally more convenient and flexible to use, and more straightforward to transfer among different case studies (issue I5). Although correlation-based localization (Luo et al., 2018a) helps address the aforementioned issues (I1 – I5), it is also subject to two noticed problems. One is that, it requires to run a relatively large ensemble (e.g., with a few thousands of reservoir models) in order to achieve decent performance in an automatic manner, which becomes computationally expensive in large-scale problems. As a result, certain empirical tuning factors are introduced in (Luo et al., 2018a, 2019) to

2. Correlation-based adaptive localization 2.1. Kalman-gain-type localization Let m 2 Rm stand for m-dimensional reservoir model, g : Rm →Rp for the forward reservoir simulator, and do 2 Rp for p-dimensional obser­ vations, which are assumed to be contaminated by certain observation errors with zero mean and covariance Cd 2 Rp�p . Without loss of gen­ erality, the update formula in an ensemble-based history matching method can be written as � (1) ma ¼ mb þ K Δdb ; Δdb � d g mb ; where the superscript “b” stands for background (also known as prior), and “a” for analysis (also known as posterior); K is a Kalman-gain-like matrix, and Δdb is the innovation term with respect to mb , where d represents the real observation do or a perturbed sample drawn from the Gaussian distribution Nðdo ; Cd Þ. The innovation term Δdb may be 1=2

normalized by a square root matrix Cd to avoid the scale effect of different types of observations, and may also be projected onto an ensemble sub-space to reduce the number of effective observations (Luo et al., 2019; Lorentzen et al., 2019). Accordingly, K and Δdb may have various forms in practical implementations. This, however, does not affect our introduction to the essential idea behind correlation-based adaptive localization. Let mak stand for the kth model variable (k ¼ 1; 2;⋯;m) of ma (and mbk be similarly defined), then Eq. (1) is equal to mak ¼ mbk þ

p X s¼1

2

Kks Δdbs ;

(2)

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

where Kks denotes the element of K at the kth row and the sth column,

or overcome the aforementioned practical issues (I1 – I5) arising in distance-based localization. First of all, the tapering coefficients cks in Eq. (5) are computed using correlations between model variables and simulated observation ele­ ments, and thus do not involve the physical locations of model variables and observations (issue I3). In addition, Eqs. (3) and (5) imply that, a model variable is updated by only using observations that have large enough correlations (in magnitudes) with it, regardless of the physical locations of these ob­ servations. Therefore, it can be used to handle non-local observations (issue I2), as demonstrated in (Luo et al., 2018a). Furthermore, as demonstrated in (Luo et al., 2018a, 2019), the tapering matrix C in Eq. (3), when constructed through Eq. (5), will adapt itself to various factors, such as different types of petro-physical parameters for update, the time-lapse effect of observations, and so on (issue I4). Because of its adaptivity, it becomes easier to transfer an implemented correlation-based localization scheme to various case studies (issue I5). Finally, as discussed in (Luo et al., 2019), to handle big datasets (issue I1), one can project the original big datasets onto an ensemble sub-space in the observations space (obtained by applying a truncated singular value decomposition (TSVD) to a square root matrix whose columns consist of the ensemble of simulated data normalized by a factor), and use the projected data as the observations to update the model variables, similar to what is done in (Luo et al., 2017, 2018b). Accordingly, one only needs to determine the tapering coefficients for the pairs of model variables and projected data. Since the number of projected data, which is equal to the retained rank of the square root matrix after TSVD, is no larger than the ensemble size (often around 100 in practice), localization can be conducted in a very efficient way, as demonstrated in (Lorentzen et al., 2019).

and Δdbs the sth element (s ¼ 1; 2; ⋯; p) of Δdb . Eq. (2) implies that, the change of a model variable, e.g. mak mbk , stems from the integrated

contributions from individual innovation terms Δdbs , whereas the degree of each individual contribution is specified by Kks . In ensemble-based data assimilation, the Kalman-gain-type matrix K is typically computed using a relatively small ensemble of realizations. As such, aforementioned issues such as rank deficiency and sampling errors will arise in the course of computing K, and tend to result in deteriorated assimilation performance. Localization is thus often employed to tackle these issues. In the current work, we pay particular attention to Kalman-gain-type localization that is typically adopted in practical history matching problems. In Kalman-gain-type localization, one introduces scalar coefficients cks 2 ½0; 1� to Eq. (2), and revises the update formula as follows

mak ¼ mbk þ

p X ðcks Kks ÞΔdbs :

(3)

s¼1

Following the previous discussion, the role of Kalman-gain-type localization can be considered as to modify the degrees Kks of contri­

butions from individual innovation terms Δdbs to the changes ðmak mbk Þ of model variables. Equivalently, Eq. (3) can be re-expressed in a more compact form, as follows: ma ¼ mb þ ðC ∘ KÞΔdb ;

(4)

where C is a tapering matrix comprised of the elements cks , and ∘ stands for the Schur product operator. 2.2. Using correlations to choose tapering coefficients

2.3. Practical implementation of correlation-based adaptive localization and noticed problems

There are different ways of choosing the tapering coefficients cks in various localization schemes. For instance, in distance-based localiza­ tion (Chen and Oliver, 2010; Emerick and Reynolds, 2011; Hamill et al., 2001; Raniolo et al., 2013), cks is typically a monotonically decreasing function of the distance between the physical locations of the kth model variable mbk and the sth element dos . Because of the use of a physical-distance based tapering function, the aforementioned issues (I1 – I5) may arise in certain circumstances, as discussed in (Luo et al., 2018a, 2019). In correlation-based localization (Luo et al., 2018a, 2019), the determination of a tapering coefficient cks is interpreted as a causal-relation detection problem. The rationale behind this interpre­ tation is that, if there is a causal relation from the model variable mbk to

A straightforward implementation of Eq. (5) requires to find a threshold θks for each pair of model variable and observation element. To reduce the complexity of implementation, the concept of correlation field is exploited in (Luo et al., 2018a, 2019). A correlation field, denoted by ρGs , is a set of correlation coefficients between the sth simulated observation element, and a group of model variables mbk . This group of model variables, denoted (and also indexed) by G, typically represents the same type of petro-physical parameters (e.g., permeability or porosity) distributed over active reservoir gridblocks. To lessen the complexity of implementation, we compute a single threshold value θGs for all correlation coefficients in ρGs . With this simplification strategy, Eq. (5) is reduced to

the sth simulated observation element, then one should use Δdbs to up­

date mbk ; otherwise, one should avoid using Δdbs for the update of mbk . For computational efficiency, (Luo et al., 2018a, 2019) adopt the correlation coefficient between mbk and the sth simulated observation element, denoted by ρks , as a measure to detect the causality. In the context of ensemble data assimilation with a relatively small ensemble size, due to the effect of sampling errors, ρks may not be exactly zero even if the pair of model variable and simulated observation element are indeed uncorrelated. To tackle this problem, (Luo et al., 2018a, 2019) treat sampling er­ rors as noise, and compute the tapering coefficient cks through the following formula: cks ¼ I ðabsðρks Þ > θks Þ;

cks ¼ I ðabsðρks Þ > θGs Þ; for all k 2 G:

(6)

In (Luo et al., 2018a, 2019), the rationale behind the way to deter­ mine the threshold values θGs is as follows. First, we note that one can decompose a sample correlation field ρGs as

ρGs ¼ ρ∞ Gs þ εGs ;

(7)

where ρ∞ Gs is the true correlation field evaluated with an infinite sample size, and εGs the field of sampling errors associated with ρGs . As shown in (Luo et al., 2018a), it seems that the distribution of the noise (sampling errors) εGs can be approximated by a Gaussian distri­ bution with mean zero and (unknown) variance σ 2Gs , whereas the stan­ dard deviation (STD) σGs itself can be estimated by the median absolute deviation (MAD) estimator (Donoho and Johnstone, 1995)

(5)

where I ð�Þ is the indicator function, with its value being 1 if the con­ dition � is satisfied, and 0 otherwise; absð�Þ returns the absolute value of the input �; and θks � 0 is a threshold value to be determined through a certain criterion. Now let us explain why correlation-based localization can mitigate

σ Gs ¼

medianðabsðεGs ÞÞ : 0:6745

(8)

Given the estimate σGs , a threshold value θGs is then determined 3

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

through the universal rule (Donoho and Johnstone, 1994), such that qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi θGs ¼ 2 lnð#ρGs Þ σ Gs ; (9)

separate noise from the sample correlation fields, and then estimate the noise levels accordingly. In (Luo et al., 2018a, 2019), a wavelet-based denoising algorithm specifically developed for additive Gaussian white noise (AGWN) is adopted for noise separation and noise-level estima­ tion. It is found that this denoising algorithm tends to underestimate the noise levels, since, as demonstrated in (Luo et al., 2018a), the actual noise appears to be spatially correlated colored noise. A tuning factor is thus introduced in (Luo et al., 2018a) to manually compensate for the systematically under-estimated noise levels. In addition, one may also introduce a lower bound to the correlation threshold values to prevent excessive under-estimations of the noise levels (Luo et al., 2019). As an alternative idea, it may also appear natural for one to consider adopting an image denoising algorithm designed for colored noise (cf. (Luo and Bhakta, 2017)) instead. While such an idea is indeed feasible, as demonstrated in a separate study (Lorentzen et al., 2019), from our experience it seems that the performance of denoising algorithms might be case-dependent, meaning that in general one needs to find out a suitable denoising algorithm through some trial-and-error procedure. In turn, this leads to another type of empirical tuning in practice. From the above discussions, we diagnose the following two problems:

where #ρGs is the number of elements in ρGs (hence εGs ). The reason to adopt the universal rule is as follows: As a hypothesis, let us first assume ρ∞ ks ¼ 0 (k 2 G). In this case, Eq. (7) implies that ρks is a

sample drawn from the Gaussian distribution Nð0; ðσ Gs Þ2 Þ. Since the probability prðmaxðabsðρGs ÞÞ < θGs Þ→1 as ð#ρGs Þ→∞ (Mallat, 2008)[p. 556], if absðρks Þ > θGs , then we conclude that the hypothesis ρ∞ ks ¼ 0 is invalid almost surely. Instead, we assert that ρ∞ ks 6¼ 0, meaning that (with probability one) there exists a causal-relation from the kth model vari­ able to the sth simulated observation element. Therefore, the innovation term with respect to the sth observation element should be used to up­ date the kth model variable, which corresponds to setting cks ¼ 1. In contrast, if absðρks Þ � θGs , then one cannot completely reject the hy­ pothesis ρ∞ ks ¼ 0. In this case, to be on the safe side, (Luo et al., 2018a, 2019) choose to exclude the sth observation element for the update of the kth model variable, which corresponds to setting cks ¼ 0. From the above discussions (Eqs. (6)–(9)), it is clear that a critical step in using correlation-based adaptive localization is to properly es­ timate the STD σ Gs , which in turn is related back to appropriate sepa­ rations of the noise fields εGs from the sample correlation fields ρGs . In this regard, (Luo et al., 2018a) considers two approaches, called Ideal and Practical, respectively. In the Ideal approach, sample correlation fields are computed twice, resulting in two sets of (sample) correlation fields, which are denoted by ρLGs and ρGs , respectively. The correlation fields ρLGs are calculated using a relatively large ensemble with a few thousands of reservoir models, whereas ρGs are evaluated using a relatively small one with around 100 reservoir models. The correlation fields ρLGs are considered as reasonably good approximations to the true correlation fields ρ∞ Gs . Accordingly, the differences, ρGs ρLGs , are treated as reasonable approximations to the noise fields εGs . The STDs (or noise levels) σGs of εGs can thus be approximately estimated using the approximate noise fields ρGs ρLGs , without involving any empirical tuning factors. While the Ideal approach yields reasonably good estimates of the noise levels σ Gs in the sample correlation fields, it requires to run a relatively large ensemble of reservoir models, which can be computationally too expensive in prac­ tical history matching problems. To reduce the computational costs, (Luo et al., 2018a) also proposes the Practical approach, in which the sample correlation fields ρGs computed from a relatively small ensemble are treated as (2D or 3D) images contaminated by noise (due to sampling errors). From this perspective, it is natural to adopt a certain image denoising algorithm to

P1: The way of estimating noise-levels in sample correlation fields is either too expensive for large scale problems (as in the Ideal approach), or requires certain empirical tuning (as in the Practical approach); P2: The way of computing the tapering coefficients (cf. Eq. (6)) would induce discontinuities, and ignore certain amount of data information in the course of updating model variables. Discontinu­ ities may incur in certain regions of a tapering-coefficient field, when the corresponding correlation field crosses the associated threshold value θGs . Consequently, one would neglect the information from the sth observation element for updating the kth model variable (k 2 G), if absðρks Þ is slightly lower than the threshold values θGs , but actually ρ∞ ks 6¼ 0. 3. Correlation-based adaptive localization with two enhancements This section focuses on introducing two enhancements (E1 and E2) to tackle the previously identified problems (P1 and P2). 3.1. E1: A more efficient workflow for the estimation of noise levels As aforementioned, to achieve reasonably good estimations of noise levels in Eq. (8), one option is to properly separate noise εGs from the Fig. 1. Workflow that uses the random-shuffle approach to calculate the tapering matrix for correlation-based adaptive localization. Distinct colour schemes are used to indicate different modules within the workflow, which include the calculation of initial sample correlations between the initial en­ sembles of reservoir models and the corresponding simulated data (orange); the creation of substitute noise fields based on the initial ensemble of simulated data and the shuffled ensemble of reservoir models (purple); the estimation of noise levels by applying MAD rule, Eq. (8), and universal rule, Eq. (9), to the substitute noise fields (light green); and the con­ struction of the tapering matrix by applying the esti­ mated noise levels to the initial sample correlations (light blue). (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

4

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

sample correlation fields ρGs , which appears to be challenging when one cannot afford to run a relatively large ensemble. In the proposed workflow below, instead of directly estimating εGs , we aim to find another set of noise fields ~εGs , which are induced by sampling errors in an environment similar to the one that has caused εGs . Here, our objective is not to make ~εGs good approximations to εGs . Instead, we are more interested in using the resulting noise-level esti­ mations σ~Gs (obtained by plugging ~εGs into Eq. (8)) as reasonably good approximations to σGs , for subsequent computation of the threshold values θGs through Eq. (9). From a statistical point of view, one can create such a noise field ~εGs by drawing a random sample from the joint distribution that the ele­ ments of εGs follow (i.e., the joint distribution with respect to the sam­ pling errors distributed at all relevant reservoir gridblocks). The problem, however, is that we do not know that joint distribution in advance. Here, we consider an alternative strategy by taking advantage of a specialty in our problem, that is, theoretically, we know that the (true) correlation coefficient between two statistically independent random variables should be zero. In correlation-based adaptive locali­ zation, the sample correlation fields ρGs are computed using an ensemble

discussed in (10)), which is used as a substitute of εGs for the compu­ tation of the threshold value θGs , through Eqs. (8) and (9). Next, the threshold value θGs is applied to the sample correlation field ρGs esti­ mated using the ensembles M and D, for the calculations of tapering coefficients afterwards. In a real implementation, however, there is no ~ separately, since they are only different in their need to store M and M orders of the indexes. From the above discussion, it is clear that, in effect, the proposed workflow for the estimation of noise levels only involves the ensemble ~ whose size is typically around 100 in practice. Therefore, there is no M, need to run reservoir simulations using a relatively large ensemble (around a few thousands), as in the Ideal approach of (Luo et al., 2018a). In addition, it avoids direct estimations of the noise fields εGs in the sample correlation fields ρGs (as in the Practical approach of (Luo et al., 2018a)), which is often challenging in practice. Instead, by using the substitute noise fields ~εGs , one can achieve reasonably good estimates of noise levels, as will be demonstrated later. In terms of computational costs, for an initial ensemble of Ne reser­ voir models, one needs to run Ne forward simulations to obtain the corresponding ensemble of simulated data, whether localization is conducted or not. Taking this Ne forward simulations as the basic computational cost, then in the Ideal approach, one needs to use NLe additional forward simulations, where NLe stands for the size of the relatively large ensemble (around a few thousand) therein. In contrast, in the random-shuffle-based approach, there is no additional forward reservoir simulation required. The incurred computational overhead is mainly from the computationally cheap operation of random shuffle, as well as the estimation of noise fields (which is also required in the Ideal approach). As an additional remark, one may also adopt a computationally even cheaper method to approximately estimate the noise levels (sampling errors). The main idea is as follows (Anderson, 2003)[Theorem 4.2.4]: For an ensemble of Ne realizations from a bivariate normal distribution with true correlation ρ, as the sample size Ne increases, the sample correlation coefficient approaches a Gaussian distribution

e of Ne reservoir models and the corresponding ensemble M � fmj gNj¼1

e D � fdj : dj ¼ gðmj ÞgNj¼1 of simulated observations, by evaluating over

Ne model-data pairs ðmj ; dj Þ (Luo et al., 2018a). If, for instance, one re­ ~ � fm ~ j gNe , in such a way that places the ensemble M by another one M j¼1

~ j ; dj Þ, m ~ j and dj are statistically indepen­ in the new model-data pair ðm dent of each other, then by the decomposition in Eq. (7), the resulting sample correlation field ρ~Gs satisfies

(10)

ρ~Gs ¼ ~εGs ; since we know ~ρ∞ Gs ¼ 0.

~ one option is to exploit the existing To construct the ensemble M, ensemble M, whose workflow is depicted in Fig. 1. We assume that the ensemble members mj (j ¼ 1; 2; ⋯; Ne ) are independent and identically distributed (i.i.d) samples, which is often the case when M is an initial ensemble before history matching.1 With the i.i.d assumption, and also by treating M ¼

Nðρ; ð1 ρ2 Þ2 =Ne Þ asymptotically. As a result, by assuming that the joint distribution of model-data pairs ðm; dÞ is Gaussian, the estimated noise pffiffiffiffiffi level at ρ ¼ 0 is thus roughly 1= Ne . In the experiments later, we will show that the random-shuffle based approach yields slightly lower pffiffiffiffiffi estimated noise levels than the value 1= Ne predicted by the asymptotic distribution. pffiffiffiffiffi In one of our experiments where the asymptotic value 1= Ne is used as the estimated noise levels, the resulting correlation-based adaptive localization leads to reasonably good history matching results (not shown in this work for succinctness). Bearing in mind that such esti­ mations rely on asymptotic analysis and Gaussianity assumption to derive the limiting distribution, however, we decide to choose the random-shuffle based approach in the current work. On the other hand, we note that the asymptotic-distribution-based approach might be more flexible in certain circumstances, e.g., when the i.i.d assumption is not valid for conducting random shuffle to generate substitute sampling errors (see our discussion in the previous footnote), or when dealing with free (or non-local) parameters as discussed in (Lacerda et al., 2019; Luo et al., 2019).

e fmj gNj¼1

as a set with an ordered list of indexes, a simple ~ is to randomly shuffle (RndShfl) the list of and efficient way to create M Ne ~ ¼ fm ~j : m ~ j ¼ m~j ; ~j 6¼ jg . indexes of M, so that M j¼1

~ by randomly shuffling In other words, we create a new ensemble M the positions of all ensemble members in M. We also make sure that, after the shuffling, no ensemble member is kept at its original place, so ~ j ; dj Þ, m ~ j is independent of dj ( ¼ gðmj Þ). that in the model-data pair ðm The independence assertion is built upon the i.i.d assumption, in the ~ j and mj , are statistically sense that, if two ensemble members, e.g. m ~ j and gðmj Þ (treated as a transform of mj through the independent, then m reservoir simulator g) would also be statistically independent, as the ~ j. simulator g should not be dependent on m ~ is consistent with the Using an initial ensemble M to construct M

settings in our numerical experiments later, since, following (Luo et al., 2018a), in the current work we use the initial ensembles to compute the tapering matrices once and for all. In the course of constructing the ~ are tapering matrices, two nominal ensembles, namely, M and M, ~ together with the ensemble D of simulated observations of involved: M, M, is used to calculate the sample correlation field ~ρGs ( ¼ ~εGs , as

3.2. E2: A continuous tapering rule In (Luo et al., 2018a, 2019), the hard-thresholding (keep-or-kill) tapering rule, Eq. (6), is used to determine the values (0 or 1) of tapering coefficients. As discussed previously, this may cause two issues, namely, discontinuities in the tapering coefficient fields, and ignoring of poten­ tially still-influential data information for the updates of certain reser­ voir model variables. To address the above two concerns, an immediate solution is to adopt a continuous tapering rule. In analogy to the situation where one wants

1 The cases where the i.i.d assumption is invalid are not touched in the current work. In such a situation, one possible remedy would be to use the result of the asymptotic distribution of sample correlations. This idea seems to work reasonably well in our preliminary investigations in a real field case study (to be reported elsewhere), although more fundamental understandings of this problem are still in need.

5

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

Table 1 Information summary of the 2D Norne case study. Model dimension

39 � 26 (1014 gridblocks), with 739 out of 1014 being active cells

Parameters to estimate

X-directional permeability (PERMX) and porosity (PORO) on active gridblocks. Total number: 2 � 739 ¼ 1478

Seismic data

Leading wavelet coefficients of amplitude-versus- angle (AVA) data, as in Scenario (S2) of (Luo et al., 2018a). Total number: 3757 Iterative ensemble smoother (Luo et al., 2015) with an ensemble of 100 reservoir models, together with correlation-based localization

History matching method

Table 2 Comparison of estimated noise levels (STDs) for sample correlation maps at observation index 500 (OI-500) and 1000 (OI-1000), respectively, obtained using the Ideal approach in (Luo et al., 2018a) and the random-shuffle (RndShfl) approach in E1. In the RndShfl approach, we repeatedly shuffle the positions of the initial ensemble members 20 times, each time generating different approx­ imate noise maps, hence distinct estimations of noise levels. Therefore, for the RndShfl approach, we present its estimated noise levels in the form of mean � STD (over 20 repetitions).

Fig. 2. Tapering coefficient as a Gaspari-Cohen function of the magnitude of sample correlation coefficient, at a hypothetic threshold value 0.2 (in red), 0.4 (in blue) and 0.6 (in green), respectively. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

to move from the (hard-thresholding-based) tapering rule in local analysis (Houtekamer and Mitchell, 1998) to a more continuous tapering rule in distance-based localization (Hamill et al., 2001), in this work we adopt the Gaspari-Cohn (GC) function (Gaspari and Cohn, 1999) to compute the tapering coefficients. This appears to be a popular choice for many distance-based localization methods (see, e.g., Emerick and Reynolds, 2011; Hamill et al., 2001), whereas other types of tapering functions may also be used. The GC function, denoted by fGC ðzÞ with z � 0 being a dummy variable, is given by 8 15 14 53 52 > > z þ z þ z z þ1; if 0 � z � 1 ; > > 4 2 8 3 > < 2 1 fGC ðzÞ ¼ 1 5 1 4 5 3 5 2 (11) > z z þ z þ z z ; if 1 < z � 2 ; 5z þ 4 > > 12 2 8 3 3 > > : 0; if z > 2 :

Ideal (PORO)

RndShfl (PORO)

Ideal (PERMX)

RndShfl (PERMX)

OI-500

0.1120

0:0946 � 0:0089

0.1039

0:0945 � 0:0074

OI1000

0.1039

0:0982 � 0:0080

0.0896

0:0957 � 0:0092

visit Scenario (S2) of the synthetic 2D Norne field case study in (Luo et al., 2018a), and use it to illustrate the performance or characteristics of the proposed two enhancements. Some experimental settings in this 2D case study, which are essential and relevant to the current work, are recapped in Table 1, based on Table 2 of (Luo et al., 2018a). Our goal here is twofold. First, we aim to demonstrate that, with an initial ensemble of 100 reservoir models, the proposed workflow in E1 leads to reasonably good estimations of noise levels, which are close to the values obtained using the Ideal approach in (Luo et al., 2018a) with a much higher number of reservoir models. Second, in connection to enhancement E2, we also intend to illustrate the improved smoothness in the tapering coefficient fields obtained using the GC-function-based continuous tapering rule (GC rule for short) in Eq. (12), in comparison to those resulting from the hard-thresholding rule in Eq.(6). In the next section, we will further show that the GC rule tends to result in better history matching performance than the hard-thresholding one.

Note that, in distance-based localization, the tapering coefficient is set to 1 if the distance between a pair of model variable and observation element is zero. In correlation-based localization, intuitively, one would expect that a higher sample correlation coefficient (in magnitude) should result in a higher associated tapering coefficient. In line with this intuition, the tapering rule adopted in this work is as follows: � � 1 absðρks Þ cks ¼ fGC : (12) 1 θGs

4.1. Performance of noise level estimations in E1

Here, ð1 absðρks ÞÞ=ð1 θGs Þ is the distance between absðρks Þ and 1 (the maximum possible value of absðρks Þ), normalized by the length scale (or critical length) ℓGs � 1 θGs , where θGs is a threshold value computed using the workflow in the previous sub-section, and ℓGs rep­ resents the distance between θGs and 1. Fig. 2 shows tapering coefficients computed using the GC function, with a hypothetic threshold value θGs ¼ 0:2, 0.4 and 0.6, respectively. As desired, one has cks ¼ 1 at absðρks Þ ¼ 1. Although the tapering function made by combining Eqs. (11) and (12) tends to render reasonably good history matching performance in the case studies later, it might not be an optimal choice in any sense. In the context of correlation-based adaptive localization, how to optimize the tapering function remains to be an open problem. We will consider investigations on this topic in our future studies.

In Scenario (S2) of the 2D Norne field case, the original seismic data consist of amplitude-versus-angle (AVA) data obtained at four seismic surveys. Wavelet-based sparse data representation is introduced to simultaneously reduce the data size and estimate the noise levels in the (noisy) AVA data (Luo et al., 2017, 2018b). As such, discrete wavelet transforms (DWT) are applied to the AVA datasets to obtain a set of wavelet coefficients, which serves as a full representation of the original AVA dataset in the wavelet domain. Noise levels in the wavelet co­ efficients are estimated, and then used to calculate some threshold values, based on which one can select a sub-set of leading wavelet co­ efficients for sparse representation of the original AVA datasets. In this particular case, the total number of seismic data (leading wavelet co­ efficients) used in history matching is 3757, whereas the petrophysical parameters (model variables) involve x-directional permeability (PERMX) (in the scale of natural logarithm) and porosity (PORO) distributed over 739 active reservoir gridblocks. For demonstration, in what follows, we confine ourselves to examine the sample correlation maps between (log) PERMX (or PORO) and the lead wavelet coefficients

4. Numerical investigations with respect to the proposed enhancements Before we present history matching results in the next section, we re6

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

Fig. 3. Sample correlation maps between (log) PERMX (top row) or PORO (bottom row) and one of the leading wavelet coefficients (with the observation index being 500, OI-500 for short), respectively. In the first column, the sample correlation maps are computed using 100 ensemble members (Ens-100 for short); and in the second column, they are computed using 3000 members instead (Ens-3000 for short); whereas the last column contains the difference (Diff for short) maps between the first and the second columns (maps in column 1 minus maps in column 2). The correlation maps calculated using 3000 members can be considered as reasonably good approximations to the true ones, therefore comparisons between the correlation maps computed using 100 and 3000 members, respectively, can help reveal the areas where spurious correlations dominate (see the discussion in the text). In the Ideal approach of (Luo et al., 2018a), the difference maps in the third columns are used as approximations to the noise (due to sampling errors) in the correlation maps computed using 100 ensemble members (i.e., those in the first column). We also note that non-active blocks (in white) do not contain relevant information for history matching.

at the observation index of 500 or 1000 (OI-500 and OI-1000 for short), whereas the results for the other leading wavelet coefficients are similar. To implement correlation-based adaptive localization, (Luo et al., 2018a) uses a so-called Ideal approach to estimate the noise levels in the sample correlation maps, which are calculated using an initial ensemble of 100 reservoir models. As introduced in a previous section, in the Ideal approach, apart from the initial ensemble with 100 reservoir models, another initial ensemble with 3000 reservoir models is also generated and used in forward reservoir simulations to compute sample correlation

maps, which are treated as reasonable approximations to the underlying true correlation maps. The differences between the sample correlation maps computed using 100 and 3000 reservoir models, respectively, are then taken as the approximate noise maps due to sampling errors. For illustration, Fig. 3 shows the sample correlation maps between PERMX (top row) or PORO (bottom row), and the leading wavelet co­ efficient at OI-500, computed using 100 (first column) and 3000 (second column) reservoir models, respectively, together with the associated difference maps (defined as the correlation maps with 100 reservoir

Fig. 4. As in Fig. 3, but for the leading wavelet coefficient with the observation index being 1000. 7

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

Fig. 5. Approximate noise maps obtained by the Ideal approach in (Luo et al., 2018a), which have been shown in sub-figures (c) and (f) of Figs. 3 and 4, and are re-plotted in the first column of the current figure for visual comparison. Plotted in the second column are those produced by the approach of random shuffle (RndShfl) proposed in E1. In the current figure, the noise maps in the first two rows correspond to those with respect to the leading wavelet coefficient at the observation index 500, while those in the last two rows to the leading wavelet coefficient at the observation index 1000.

8

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

Fig. 6. Histograms corresponding to the noise maps in Fig. 5.

9

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

Fig. 7. Sample correlation maps with 100 reservoir models, re-plotted in the first column of the current figure from Figs. 3 and 4; and associated tapering coefficient maps (TCMs) obtained by applying the hard-thresholding rule Eq. (6) (second column) and GC rule Eq. (12) (third column), respectively. For each sample correlation map, the threshold value in both tapering rules is calculated by inserting the corresponding mean estimated noise level of the RndShfl approach in Table 2 into Eq. (9).

models minus those with 3000 ones). In this case, seismic data appear to be more sensitive to PORO than to PERMX (Luo et al., 2017, 2018a). Therefore, for the sample correlation map between PERMX and OI-500 computed using 3000 reservoir models (Fig. 3(b)), denoted by Corr (PERMX,OI-500,Ens-3000) thereafter (similar notations are also used elsewhere), one can see that the magnitudes of the sample coefficients are quite small. However, when the sample correlation map is computed using only 100 reservoir models, the magnitudes in Corr(PERMX, OI-500,Ens-100) (Fig. 3(a)) become more substantial. This implies

that, sampling errors are dominant in the correlation map Corr(PERMX, OI-500,Ens-100), as manifested in the difference map Corr(PERMX, OI-500,Diff) (Fig. 3(c)). On the other hand, for the correlation maps with respect to PORO, one can see that, with 3000 reservoir models, there exists a region (lower left) in the correlation map Corr(PORO,OI-500,Ens-3000) (Fig. 3 (e)), where the magnitudes of sample correlations are relatively strong and dominate the (approximate) sampling errors in Corr(PORO,OI-500, Diff) (Fig. 3(f)). As a result, in the correlation map Corr(PORO,OI-500, 10

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

Ens-100) computed using 100 reservoir models, this (lower left) region still consists of relatively strong correlation magnitudes, although the rest of the map is instead filled with sampling errors. Similar phenomena are also observed in Fig. 4, which reports sample correlation maps as in Fig. 2, but for the leading wavelet coefficient at OI-1000. From the above discussion, it becomes natural and desirable that, the objective of localization be set to guide the use of observations to update model variables in the regions with substantial true correlations, while avoiding or tapering the updates in the regions that are dominated by sampling errors. In the next sub-section, we show how correlation-based localization is able to achieve this goal. For comparison, we also apply the RndShfl approach in enhancement E1 to generate approximate noise maps. Since there is randomness in shuffling the positions of ensemble members, we conduct the experi­ ment 20 times to take statistical fluctuations into account. In each repetition of the experiment, we generate a randomly shuffled list of indexes for members in the initial ensemble with 100 reservoir models, and use the corresponding randomly-shuffled ensemble, together with a set of simulated observations (with an unshuffled and fixed list of in­ dexes), to compute sample correlation maps (hence noise maps due to sampling errors, as explained in the preceding section). In the second column of Fig. 5, we show the noise maps (with respect to PERMX or PORO and at OI-500 or OI-1000) obtained by the RndShfl approach in one of the repetition experiments. For ease of comparison, we re-plot in the first column of Fig. 5 the counterpart noise maps from Figs. 3 and 4. An inspection on these noise maps reveals that, visually, the maps from the Ideal and RndShfl approaches appear as if they are different samples drawn from similar probability distributions. As a result, for the purpose of noise level estimations, it appears reasonable to use the noise maps generated by the RndShfl approach as the substitutes of the noise maps produced by the Ideal approach, and by doing so, one can avoid simulating a relatively large ensemble of reservoir models as required in the Ideal approach. For a further examination, Fig. 6 plots the histograms that corre­ spond to the noise maps in Fig. 5. Differences among the histograms from the Ideal and RndShfl approaches can be identified, although to some extent they also exhibit certain similarities. All histograms in Fig. 6 appear to be unimodal, and can thus be approximated by certain Gaussian distributions. This observation justifies the use of the MAD estimator in Eq. (8) to estimate the noise levels. Finally, Table 2 reports the estimated noise levels in the sample correlation maps of PERMX (or PORO) at OI-500 (or OI-1000), obtained through the Ideal and RndShfl approaches, respectively. To limit the computational costs in forward reservoir simulations, the experiment with the Ideal approach is conducted once. As a result, in the table, there is only a single number associated with each estimated noise level in the Ideal approach. In contrast, in the RndShfl approach, the experiment is conducted 20 times without incurring substantial extra computational costs. Therefore, we report the estimated noise levels in the form of mean � STD. As one can see in the table, the mean estimated noise levels of the RndShfl approach are close to those obtained by the Ideal approach, and their differences are within two estimated STD(s) of the RndShfl pffiffiffiffiffi approach. In contrast, the value 1= Ne ¼ 0:1 (Ne ¼ 100) obtained using the aforementioned asymptotic distribution provides a fairly good pre­ diction of the noise levels, although it tends to somehow overestimate them. In general, these observations confirm the feasibility of our notion to use RndShfl as an efficient alternative approach to noise level esti­ mations, but without relying on the use of asymptotic distribution of sample correlation coefficients.

models, as previously shown in the first columns of Figs. 3 and 4. These TCMs are obtained by applying the hard-thresholding rule Eq. (6) and the GC rule Eq. (12), respectively, to the sample correlation maps. We use the same threshold values θGs when computing the TCMs through Eqs. (6) and (12). More specifically, for each sample correlation map, a threshold value is determined by plugging the corresponding mean estimated noise level of the RndShfl approach (cf. Table 2) into Eq. (9). For reference, the first column of Fig. 7 re-plots the sample correla­ tion maps with 100 reservoir models in Figs. 3 and 4. Accordingly, the second and third columns show the TCMs obtained using the hardthresholding and the GC rules, respectively. As mentioned previously, for the hard-threshold rule, its tapering coefficients take values from the discrete set f0; 1g, whereas the GC rule admits tapering coefficients in the continuous interval ½0; 1�. For ease of visual comparison, however, the range of the color bar in each TCM plot is set to ½0; 0:5�. Comparing the plots in the second and third columns, it is clear that, by using the GC-rule, one tends to obtain smoother transitions for the tapering co­ efficients to change over different regions. 5. History matching results in two case studies 5.1. 2D Norne field case study Table 1 re-caps from (Luo et al., 2018a) some information essential for the synthetic 2D Norne field case study. Note that, the values of permeability (in millidarcy) in all 2D and 3D experiments later are re-scaled through some (natural) logarithm transforms. In the 2D case, the initial ensemble of (log) PERMX and PORO are generated using Gaussian random function simulation in PETREL©. PERMX and PORO values are generated separately, using the same spherical variogram model whose horizontal range is 1901 m (around 24 gridblocks hori­ zontally) and whose vertical one is 12.8 m (around 3 gridblocks verti­ cally). In the course of generating PERMX and PORO realizations, the simulation processes are conditioned on (synthetic) log data of a pro­ ducer and an injector (except for the well log data, no other information from the reference model is used to generate the initial ensemble). Readers are referred to the previous work (Luo et al., 2018a) for more information in other respects, e.g., the configuration of an iterative ensemble smoother (iES) (Luo et al., 2015) used for history matching. Apart from the summary information in Table 1, we also specify the following two additional settings, namely, stopping criteria and per­ formance measures of the iES in use. In the current work, the following two stopping criteria are adopted in the iES: (C1) the iES reaches a maximum number of (outer) iterations. This number is 10 in the 2D Norne case, and 20 in the 3D Brugge benchmark case which is presented in the next sub-section; (C2) the relative change of the average data mismatch values in two consecutive iteration steps is less than 0:01%. The measures used for the comparison of history matching perfor­ mance include data mismatch, root-mean-squared error (RMSE) and ensemble spread, whose definitions are specified in Eqs. (28)–(30) of (Luo et al., 2018a), and thus not repeated here for succinctness. Roughly speaking, data mismatch measures the distance (in observation space) between the simulated data from a history-matched reservoir model and the observed data; RMSE tells the distance (in model space) between a history-matched reservoir model and the reference reservoir model (which is available in a synthetic case study); and ensemble spread, when compared with RMSE, indicates the performance in quantifying the uncertainties in history-matched reservoir models. The ensemble spread will be typically much smaller than the RMSE when ensemble collapse happens, but should be closer to the RMSE when localization is properly conducted during history matching. We note that in our ex­ periments, data mismatch is involved in the processes of history matching (e.g., used to guide the iterations in the iES), whereas neither RMSE nor ensemble spread has any influence on the history-matching processes. Instead, they are used to cross-validate history-matching re­ sults, after history matching is finished.

4.2. Smoother tapering based on the GC-rule in E2 This sub-section focuses on comparing the tapering coefficient maps (TCMs) with respect to the sample correlation maps with 100 reservoir 11

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

Fig. 8. Boxplots of data mismatch at different iteration steps in the 2D Norne field case study, when the Ideal approach with the hard-thresholding rule (left), the RndShfl approach with the hard-thresholding rule (middle), and the RndShfl approach with the GC rule (right), respectively, are employed to conduct correlationbased localization.

Fig. 9. As in Fig. 8, but for the boxplots with respect to the RMSEs of (log) PERMX (top) and PORO (bottom) at different iteration steps, when the three different correlation-based localization schemes (Ideal, RndShfl–hard and RndShfl–GC) are adopted.

Fig. 8 reports the boxplots of data mismatch at different iteration steps. Theses plots correspond to the history matching results with correlation-based localization, when the Ideal approach with the hardthresholding rule (Ideal for short), the RndShfl approach with the hard-thresholding and GC rules (RndShfl–hard and RndShfl–GC for short), respectively, are employed to compute correlation threshold values and conduct tapering accordingly. The results of the Ideal approach are partially reported in the previous work (Luo et al., 2018a), and are included here for the comparison to the results of the RndShfl approach with both hard-thresholding and GC rules. For the Ideal approach, the stopping criterion (C2) is effective, and the iES stops after 8 iteration steps. For the RndShfl approach with either the hard-thresholding or the GC rule, it is the stopping criterion (C1) that is activated, and for either tapering rule, the iES stops after 10 iteration steps are reached. In either the Ideal or RndShfl approach, data mismatch values at the last iteration step seem to be close to each other. Similar to Figs. 8 and 9 shows boxplots of the RMSEs of PERMX and PORO at different iteration steps, when the above three correlationbased localization schemes, namely, Ideal, RndShfl–hard and RndShfl–GC, are adopted. For the RMSEs of PERMX (top row), it appears that the relative changes over the iteration steps are marginal. This is possibly because in this case, the seismic data appear to be more sen­ sitive to PORO than to PERMX (Luo et al., 2017, 2018a). Consequently, the RMSEs of PORO (bottom row) undergo more significant reductions

Table 3 Data mismatch (DM for short, in the form of mean � std), mean RMSEs and ensemble spreads (in parentheses) of PERMX and PORO in the 2D Norne field case study, with respect to the initial ensemble, and the final ones obtained using three different correlation-based localization schemes (Ideal, RndShfl – hard and RndShfl – GC). Initial

Final (Ideal)

Final (RndShfl – hard)

Final (RndShfl – GC)

DM ( � 104 )

2.3269 � 1.1489

1.0676 � 0.0601

1.0061 � 0.0378

1.0190 � 0.0471

PERMX

0.9952 (0.6862) 0.0236 (0.0139)

0.9574 (0.6327) 0.0159 (0.0098)

0.9473 (0.6045)

0.9422 (0.6036) 0.0145 (0.0082)

PORO

0.0150 (0.0094)

as the iteration goes on. In terms of (mean) RMSEs at the final iteration steps, RndShfl–GC leads to slightly better history matching results than Ideal and RndShfl–hard, as one might visually perceive from the plots in Fig. 9, and also observe from Table 3. Fig. 10 presents PERMX (first column) and PORO (second column) from the reference model (top row), and the means of PERMX and PORO from the initial ensemble (second row), and the final ones obtained through history matching with the localization scheme Ideal (third row), RndShfl–hard (fourth row) and RndShfl–GC (fifth row), respectively. In 12

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

Fig. 10. Reference (top row) and means (2nd – 5th rows) of PERMX (first column) and PORO (second column) in the 2D Norne field case study. The means of PERMX and PORO are with respect to the initial ensemble (second row), and the final ones obtained using the localization scheme Ideal (third row), RndShfl–hard (fourth row) and RndShfl–GC (fifth row), respectively.

13

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

Fig. 11. STDs of PERMX (first column) and PORO (second column) in the 2D Norne field case study, with respect to the initial ensemble (first row), and the final ones obtained using the localization scheme Ideal (second row), RndShfl–hard (third row) and RndShfl–GC (fourth row), respectively.

14

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

comparison to mean PERMX and PORO from the initial ensemble, those from the final ensembles with three different localization schemes tend to capture certain spatial patterns (e.g., those in the lower left corner) that are in the reference model, but are missing in the means of the initial ensemble. To a large extent, the mean PERMX and PORO with respect to the three localization schemes are quite similar, although noticeable differences can also be spotted in certain regions. Fig. 11 indicates the STD maps of PERMX (first column) and PORO (second column) from the initial ensemble and the final ones with the

Table 4 Information summary of the Brugge benchmark case study. Model dimension

139 � 48 � 9, with 44550 out of 60048 being active cells

Parameters to estimate

PORO, PERMX, PERMY, PERMZ. Total number is 4 � 44550 ¼ 178200

4D seismic data

Leading wavelet coefficients of the AVA data. Total number: 3293 Iterative ensemble smoother (Luo et al., 2015) with an ensemble of 103 reservoir models

History matching method

Fig. 12. Boxplots of data mismatch at different iteration steps in the 3D Brugge benchmark case study, when the RndShfl approach with the hard-thresholding (left) and GC (right) rules, respectively, are employed to conduct correlation-based localization.

Fig. 13. Boxplots of RMSEs of PERMX (top) and PORO (bottom) at different iteration steps in the 3D Brugge benchmark case study, when localization schemes RndShfl–hard (left) and RndShfl–GC (right) are employed. 15

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

Fig. 14. References (top row) and means of PERMX (first column) and PORO (second column) on Layer 2 of the Brugge model. The means of PERMX and PORO are with respect to the initial ensemble (second row), and the final ensembles from the localization schemes RndShfl–hard (third row) and RndShfl–GC (fourth row), respectively.

three localization schemes. Compared to the STD maps of the initial ensemble, those of the final ensembles have lower magnitudes, meaning that all three final ensembles have reduced variances. The STD maps of the three final ensembles exhibit very similar spatial patterns, although in some regions, the localization scheme Ideal tends to result in larger STD values than RndShfl–hard and RndShfl–GC, with the STD values of RndShfl–hard being in the middle, and those of RndShfl–GC being the lowest. For a more quantitative comparison, Table 3 counts data mismatch (in the form of mean � std), mean RMSEs and spread (in parentheses) of PERMX and PORO with respect to the initial ensemble and the final ones obtained using the localization schemes Ideal, RndShfl–hard and RndShfl–GC, respectively. In comparison to the initial ensemble, all localization schemes lead to improved history matching results in terms of mean RMSEs. In consistence with the previous results in Fig. 9, the reduction of the mean RMSE of PERMX from the initial to the final en­ sembles is relatively little (around 5%), whereas the reduction of the mean RMSE of PORO is more significant (more than 30%). Among the three localization schemes, RndShfl–GC has the lowest mean RMSEs and

ensemble spreads for both PERMX and PORO, RndShfl–hard is in the middle, and Ideal leads to the highest. In all three final ensembles, the ensemble spreads remain reasonably close to the mean RMSEs, a sign indicating that ensemble collapse is avoided. 5.2. 3D Brugge benchmark case study Here we also re-visit the synthetic Brugge benchmark case used in (Luo et al., 2018a). Table 4 provides a summary of some essential in­ formation relevant to the current case study, while readers are referred to (Luo et al., 2018a) for more information in other aspects, e.g., in terms of the ways to generate the initial ensemble and the seismic data etc. In this case study, to mimic the situation in practical history matching problems, we confine ourselves to use an initial ensemble with 103 reservoir models, such that the Ideal approach in (Luo et al., 2018a) cannot be applied. As a result, in the sequel, we focus on comparing the history matching results with the localization schemes RndShfl–hard and RndShfl–GC. Fig. 12 indicates boxplots of data mismatch at different iteration 16

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

Fig. 15. STDs of PERMX (first column) and PORO (second column) on Layer 2 of the Brugge model, with respect to the initial ensemble (first row), and the final ensembles from the localization schemes RndShfl–hard (second row) and RndShfl–GC (third row), respectively.

steps, when the localization schemes RndShfl–hard and RndShfl–GC, respectively, are adopted in history matching. In this case, both locali­ zation schemes maintain substantial spreads in the boxplots, whereas RndShfl–GC results in lower final data mismatch than RndShfl–hard. As in Figs. 12 and 13shows boxplots of RMSEs of PERMX and PORO at different iteration steps. The results with respect to PERMY and PERMZ are similar to those with respect to PERMX, and are thus not reported for succinctness. Like in the 2D Norne field case, the seismic data here also appear to be more sensitive to PORO than to PERMX. Consequently, comparing the initial and final ensembles, there are more significant relative changes in PORO than in PERMX, although for both PERMX and PORO, one can clearly see that their RMSEs tend to reduce as the iteration is carried forward. The final RMSEs of PERMX from both localization schemes are close to each other, while the final RMSEs of PORO from RndShfl–GC tend to be lower than those from RndShfl–hard. Fig. 14 reports references (top row) and means of PERMX (first column) and PORO (second column) on Layer 2 of the Brugge model. The means of PERMX and PORO are with respect to the initial ensemble (second row), and the final ensembles from the localization schemes RndShfl–hard (third row) and RndShfl–GC (fourth row), respectively. The PERMX and PORO maps from both final ensembles capture more spatial patterns in the reference maps than the maps from the initial ensemble. Meanwhile, the final PERMX and PORO maps obtained using the localization schemes RndShfl–hard and RndShfl–GC are quite similar to each other overall, although some small differences can indeed be spotted at certain locations. Corresponding to Fig. 14, presented in Fig. 15 are the STD maps of PERMX (first column) and PORO (second column), with respect to the initial ensemble (first row), and the final ensembles from the localiza­ tion schemes RndShfl–hard (second row) and RndShfl–GC (third row), respectively. In contrast to the STD maps of the initial ensemble, those

Table 5 Data mismatch (DM for short, in the form of mean � std), mean RMSEs and ensemble spreads (in parentheses) of PERMX and PORO in the Brugge bench­ mark case study, with respect to the initial ensemble, and the final ones obtained using the localization schemes RndShfl – hard and RndShfl – GC. Initial

Final (RndShfl – hard)

Final (RndShfl – GC)

DM ( � 105 )

7.4620 � 2.1106

0.7828 � 0.1420

0.5623 � 0.0629

PERMX PORO

1.6586 (1.2330) 0.0346 (0.0293)

1.5213 (1.1180) 0.0246 (0.0189)

1.5111 (1.1140) 0.0218 (0.0158)

from both final ensembles have reduced variances. The STD maps from the localization schemes RndShfl–hard and RndShfl–GC exhibit similar spatial patterns, but the variances in the STD maps from RndShfl–GC tend to be lower than those from RndShfl–hard. Again, to have a quantitative comparison, Table 5 summarizes the calculated data mismatch (in the form of mean � std), mean RMSEs and ensemble spreads (in parentheses) of PERMX and PORO, with respect to the initial ensemble, and the final ones from RndShfl – hard and RndShfl – GC. Consistent with what we have observed in Figs. 12–15, history matching has led to lower final RMSEs of PERMX and PORO, in com­ parison to those from the initial ensemble. The relative reductions in the RMSEs of PERMX (from initial to final ensembles) are around 9% at most, less than the reductions in the RMSEs of PORO, which are more than 28%. On the other hand, the spreads in the final ensembles remain reasonably close to the final RMSEs, and are substantially different from zero, meaning that ensemble collapse is avoided in the presence of the localization schemes RndShfl – hard and RndShfl – GC.

17

X. Luo and T. Bhakta

Journal of Petroleum Science and Engineering xxx (xxxx) xxx

6. Discussion and conclusion

Wintershall Norge AS and DEA Norge AS – of The National IOR Centre of Norway (RCN project number 230303) for financial supports.

In this work, we consider two enhancements that are useful for improving the efficiency and accuracy of correlation-based adaptive localization previously proposed in (Luo et al., 2018a), while making localization run in an automatic manner without incurring substantial extra computational costs. One enhancement, E1, aims to estimate noise levels in sample cor­ relation fields with reasonably good accuracies, but without the need for running a relatively large ensemble as in the previous work (Luo et al., 2018a). To this end, we take advantage of a specialty in our problem, and design an efficient workflow (random shuffle) that is able to achieve our goal without increasing the ensemble size, and at the cost of almost negligible extra computational time. Through numerical experiments, we demonstrate that the proposed workflow yields good estimations of the noise levels, which are close to those obtained by using the otherwise much more expensive Ideal approach in (Luo et al., 2018a). The other enhancement, E2, focuses on producing smoother tapering coefficients than those generated by the hard-thresholding rule adopted in (Luo et al., 2018a). To address this problem, one can draw similarities to the situation where one wants to change from the tapering rule in local analysis (Houtekamer and Mitchell, 1998) to that in distance-based localization (Hamill et al., 2001). As a natural choice, in E2 we adopt the Gaspari-Cohn (GC) function (Gaspari and Cohn, 1999) to compute the tapering coefficients, but adapt the ways to compute the distances and length scales in the GC function to our specific problem on hand. Through numerical experiments, we also show that the proposed tapering rule indeed tends to produce smoother tapering coefficients. We apply correlation-based localization with the above two en­ hancements to a 2D and a 3D case studies investigated in the previous work (Luo et al., 2018a), and compare the history matching perfor­ mance that results from the more expensive correlation-based localiza­ tion scheme (i.e., the Ideal approach) in (Luo et al., 2018a) (the Ideal approach is used in the 2D case only), and that stems from the proposed correlation-based localization scheme with different enhancement set­ tings: (1) with E1 only (i.e., the RndShfl–hard approach) (2) with both enhancements E1 and E2 (i.e., the RndShfl–GC approach); (the RndShfl–hard and RndShfl–GC approachs are used in both the 2D and 3D cases). Both case studies indicate that, RndShfl–GC tends to result in better history matching performance (in terms of RMSE) than RndShfl–hard, whereas both RndShfl–hard and RndShfl–GC tend to perform better than the Ideal approach. This observation suggests that both enhancements E1 and E2 appear useful for improving the efficacy of correlation-based adaptive localization, although they might not necessarily lead to optimal improvements in general, e.g., in terms of the optimal choice of tapering function in E2.

Appendix A. Supplementary data Supplementary data to this article can be found online at https://doi. org/10.1016/j.petrol.2019.106559. References Aanonsen, S., Nævdal, G., Oliver, D., Reynolds, A., Vall�es, B., 2009. The ensemble Kalman filter in reservoir engineering: a review. SPE J. 14, 393–412. SPE-117274PA. Anderson, T.W., 2003. An Introduction to Multivariate Statistical Analysis. John Wiley & Sons, Inc. Anderson, J.L., 2012. Localization and sampling error correction in ensemble Kalman filter data assimilation. Mon. Weather Rev. 140 (7), 2359–2371. Anderson, J.L., 2016. Reducing correlation sampling error in ensemble Kalman filter data assimilation. Mon. Weather Rev. 144 (3), 913–925. Arroyo, E., Devegowda, D., Datta-Gupta, A., Choe, J., 2008. Streamline-assisted ensemble Kalman filter for rapid and continuous reservoir model updating. SPE Reserv. Eval. Eng. 11, 1046–1060. SPE-104255-PA. Chen, Y., Oliver, D.S., 2010. Cross-covariances and localization for EnKF in multiphase flow data assimilation. Comput. Geosci. 14, 579–601. Chen, Y., Oliver, D.S., 2017. Localization and regularization for iterative ensemble smoothers. Comput. Geosci. 21, 13–30. Donoho, D.L., Johnstone, J.M., 1994. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455. Donoho, D.L., Johnstone, I.M., 1995. Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 90, 1200–1224. Emerick, A., Reynolds, A., 2011. Combining sensitivities and prior information for covariance localization in the ensemble Kalman filter for petroleum reservoir applications. Comput. Geosci. 15, 251–269. Evensen, G., 2009. Data Assimilation: the Ensemble Kalman Filter. Springer Science & Business Media. Gaspari, G., Cohn, S.E., 1999. Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc. 125, 723–757. Hamill, T.M., Whitaker, J.S., Snyder, C., 2001. Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Weather Rev. 129, 2776–2790. Hamill, T.M., Whitaker, J.S., Anderson, J.L., Snyder, C., 2009. Comments on “Sigmapoint Kalman filter data assimilation methods for strongly nonlinear systems”. J. Atmos. Sci. 66, 3498–3500. Houtekamer, P.L., Mitchell, H.L., 1998. Data assimilation using an ensemble Kalman filter technique. Mon. Weather Rev. 126, 796–811. Huang, Y., Zeng, F., 2016. The temperature-based localization for the application of EnKF on automatic history matching of the SAGD process. Comput. Geosci. 20, 187–212. Lacerda, J.M., Emerick, A.A., Pires, A.P., 2019. Methods to mitigate loss of variance due to sampling errors in ensemble data assimilation with non-local model parameters. J. Pet. Sci. Eng. 172, 690–706. Lorentzen, R., Luo, X., Bhakta, T., Valestrand, R., 2019. History matching the full Norne field model using seismic and production data. SPE J. 24, 1452–-1467. https://doi. org/10.2118/194205-PA. SPE-194205-PA. Luo, X., Bhakta, T., 2017. Estimating observation error covariance matrix of seismic data from a perspective of image denoising. Comput. Geosci. 21, 205–222. Luo, X., Lorentzen, R.J., Valestrand, R., Evensen, G., 2019. Correlation-based Adaptive Localization for Ensemble-Based History Matching: Applied to the Norne Field Case Study. SPE Reservoir Evaluation & Engineering. https://doi.org/10.2118/191305PA (in press), SPE-191305-PA. Luo, X., Stordal, A., Lorentzen, R., Nævdal, G., 2015. Iterative ensemble smoother as an approximate solution to a regularized minimum-average-cost problem: theory and applications. SPE J. 20, 962–982. https://doi.org/10.2118/176023-PA. SPE176023-PA. Luo, X., Bhakta, T., Jakobsen, M., Nævdal, G., 2017. An ensemble 4D-seismic historymatching framework with sparse representation based on wavelet multiresolution analysis. SPE J. 22, 985–1010. https://doi.org/10.2118/180025-PA. SPE-180025PA. Luo, X., Bhakta, T., Nævdal, G., 2018. Correlation-based adaptive localization with applications to ensemble-based 4D seismic history matching. SPE J. 23, 396–427. https://doi.org/10.2118/185936-PA. SPE-185936-PA. Luo, X., Bhakta, T., Jakobsen, M., Nævdal, G., 2018. Efficient big data assimilation through sparse representation: a 3D benchmark case study in petroleum engineering. PLoS One 13, e0198586. Mallat, S., 2008. A Wavelet Tour of Signal Processing. Academic press. Raniolo, S., Dovera, L., Cominelli, A., Callegaro, C., Masserano, F., 2013. History match and polymer injection optimization in a mature field using the ensemble Kalman filter. In: IOR 2013-17th European Symposium on Improved Oil Recovery. Zhang, Y., Oliver, D.S., 2010. Improving the ensemble estimate of the Kalman gain by bootstrap sampling. Math. Geosci. 42, 327–345.

Acknowledgments We would like to thank two anonymous reviewers for their constructive comments and suggestions, and Schlumberger for providing us academic software licenses to ECLIPSE© and PETREL©. The first author (XL) acknowledges partial financial supports from the research project “4D Seismic History Matching” (RCN project number 243680) which is funded by industry partners Eni Norge AS, Petrobras, and Total, as well as the Research Council of Norway (RCN); and the research project “DIGIRES” (RCN project number 280473) which is funded by industry partners AkerBP ASA, DEA Wintershall, EquiNor ASA, Lundin Norway AS, Neptune Energy Norge AS, Petrobras and Vår Energi AS, together with RCN. Both authors acknowledge the Research Council of Norway and the industry partners – ConocoPhillips Skandi­ navia AS, AkerBP ASA, Eni Norge AS, EquiNor AS, Neptune Energy Norge AS, Lundin Norway AS, Halliburton AS, Schlumberger Norge AS,

18