Bayesian Biclustering by dynamics: A clustering algorithm for SAGD time series data

Accepted Manuscript Bayesian Biclustering by dynamics: A clustering algorithm for SAGD time series data Helen Pinto, Ian Gates, Xin Wang PII: S0098-3...

Download PDF

1MB Sizes 0 Downloads 34 Views

Report

PDF Reader
Full Text

Accepted Manuscript Bayesian Biclustering by dynamics: A clustering algorithm for SAGD time series data Helen Pinto, Ian Gates, Xin Wang PII:

S0098-3004(18)31124-5

DOI:

https://doi.org/10.1016/j.cageo.2019.07.008

Reference:

CAGEO 4304

To appear in:

Computers and Geosciences

Received Date: 22 November 2018 Revised Date:

25 July 2019

Accepted Date: 26 July 2019

Please cite this article as: Pinto, H., Gates, I., Wang, X., Bayesian Biclustering by dynamics: A clustering algorithm for SAGD time series data, Computers and Geosciences (2019), doi: https://doi.org/10.1016/ j.cageo.2019.07.008. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

TITLE:

Bayesian Biclustering by Dynamics: A Clustering Algorithm for SAGD Time Series Data

RI PT

AUTHORS:

Helen Pintoa, Ian Gatesb, Xin Wanga* Geomatics Engineering, Chemical and Petroleum Engineering, University of Calgary, 2500 University Drive NW, Calgary, AB, Canada T2N 1N4

SC

a b

AUTHORSHIP STATEMENTS:

Co-Author

Co-Author

Helen Pinto Conception and design of study, data acquisition, analysis and interpretation, drafting of manuscript and all revisions. Ian Gates Domain expert. Assisted in conception and design of data set, conception of study, and manuscript evaluation. Xin Wang Assisted in design of study, manuscript revision and evaluation of intellectual content.

TE D

Principal Author

M AN U

* Corresponding author. Tel: +1 403 220 3355. Email address: [email protected]

DECLARATION OF INTERESTS: None None None

EP

Helen Pinto Xin Wang Ian Gates

AC C

CODE AVAILABILITY: https://data.mendeley.com/datasets/g8x3wybrjy/draft?a=a6108148-937e-4156-a284-367908dcdc93

HIGHLIGHTS: • • • •

Presenting a novel Bayesian biclustering method that uses informative priors. The algorithm finds and isolates important time periods in well performance. Two Steam-Assisted Gravity Drainage injection strategies were discovered. Only high injection between 3 – 23 months achieves the minimal steam-to-oil ratio.

1

ACCEPTED MANUSCRIPT

Bayesian Biclustering by Dynamics: A Clustering Algorithm for SAGD Time Series Data Ian Gates

Xin Wang

Geomatics Engineering University of Calgary Calgary, Alberta, Canada [email protected]

Chemical and Petroleum Engineering University of Calgary Calgary, Alberta, Canada [email protected]

Geomatics Engineering University of Calgary Calgary, Alberta, Canada [email protected]

SC

ABSTRACT

RI PT

Helen Pinto

Steam-Assisted Gravity Drainage (SAGD) is an extra heavy oil recovery process consisting of an upper horizontal Given its costs and

M AN U

steam injector and a lower horizontal producer that removes fluids from the reservoir.

environmental impact, SAGD injection strategies must be examined to find ways to improve performance. In this paper, a new Bayesian Biclustering by Dynamics (BBCD) method is proposed, which finds and differentiates groups of SAGD wells based on their oil production response to steam injected over time. The method automatically clusters both rows and columns of SAGD injection and production data and then generates a descriptive summary for each cluster. Clusters are described with probability distributions that capture the likelihood of transitioning between

TE D

discrete steam-to-oil ratio states. In addition, BBCD incorporates background knowledge on SAGD directly into the clustering process via prior probability distributions. Real SAGD operation data from sites in Alberta, Canada are used for this analysis. The results reveal nine production responses to two different steam injection strategies, and include a

KEYWORDS

EP

few insights into well performance.

AC C

Bayesian statistics; cluster analysis; applied computing; Steam-Assisted Gravity Drainage; SAGD

1 INTRODUCTION

SAGD is an extra heavy oil recovery process where high temperature and pressure steam is injected into an oil

sands reservoir to lower the viscosity of the oil so that it can flow to a production well located below and parallel to the injection well as shown schematically in Figure 1. Steam then replaces oil in the depleted zone, and a steam chamber grows within the reservoir [1]. The two wells operate as a well-pair but are collectively referred to in this paper as one ‘SAGD well’. Each well is therefore described by its time sequences of injected steam (cost) and produced oil (revenue) volumes. Ideally, recovery happens with an average steam to oil ratio (SOR) of 3m3/m3 or less.

2

RI PT

ACCEPTED MANUSCRIPT

Figure 1: Cross-section of a SAGD well-pair [2].

SC

While the process is easy to describe, the performance of SAGD varies considerably across the industry. For example, one well recovered 284,000 m3 of oil in 43 months, whereas a second well recovered 132,000 m3 of oil in 92

M AN U

months from drainage areas of comparable size and capacity. Benchmarking of steam injection strategies has not been attempted before because it is physically impossible to measure and reproduce exactly what is happening in the reservoir, and typical methods of analysis such as analytical calculation, numerical simulation and physical experimentation, are better suited for exploring the performance of a single well or SAGD field. In addition, the following two real-world challenges should be taken into consideration. First, there are an unknown number of life cycle stages for each well. SAGD wells are believed to undergo three life cycle stages: early-

TE D

stage is when the steam chamber grows vertically until it reaches the caprock above the reservoir; mid-stage is when the chamber expands laterally; and late-stage occurs when adjacent steam chambers merge and the recovery from individual well-pairs is no longer based solely on the steam injected at that location. Although there is physical justification for these three life cycle stages, the actual number of stages a SAGD well undergoes, and the duration of

EP

each stage, is unknown. Second, there is variability in the underlying reservoir characteristics of each SAGD field. Some reservoirs are more challenging to work with than others. Their oil (pay) zone may be thin, or have relatively low

AC C

oil saturation, limiting total recovery, or it may contain significant shale barriers that restrict the flow of oil to the production well.

To find life cycle stages, the data set must be partitioned over time. Similarly, to isolate differences in reservoir

characteristics, the data set must be partitioned by wells, and the partitioning process must include any prior knowledge of how these characteristics affect the SAGD process. The biclustering requirement is handled via the proposed algorithm and prior knowledge is introduced via calculated steam and oil volumes [3, 4, 5]. The SAGD data set itself consists of historical steam injection and oil production volumes for 328 wells from nine SAGD projects in the province of Alberta, Canada [6]. When prior information is added, each well is represented by four time series – observed steam and oil volumes, and analytically calculated steam and oil volumes.

3

ACCEPTED MANUSCRIPT

In this paper, we propose a new algorithm called Bayesian Biclustering by Dynamics (BBCD). Clustering by dynamics is the problem of grouping time series so that the elements of each cluster exhibit similar behavior relationships, usually attributed to some common underlying generative process. Bayesian clustering by dynamics summarizes the relationships within a cluster as a transition probability matrix under the Markov assumption, where

RI PT

each row in the matrix represents probabilities of a transition from one state to every other state of a given variable in the next time step. Bayesian biclustering by dynamics uses transition probability matrices to summarize clusters that can occur over time step ranges within groups of time series. In this way, it is possible to capture both the underlying generative processes and how they evolve over time. Finally, cluster interpretation is achieved by first manually

SC

converting observed and prior steam injection and oil production volumes into intuitive features known as combined steam-oil states (cstates) before clustering.

M AN U

The contributions of this paper are twofold. First, the proposed BBCD algorithm finds clusters within groups of SAGD wells and clusters over a range of time steps, while incorporating reservoir characteristics as prior knowledge into the clustering process. The approach handles time series of different lengths, seamlessly incorporates gaps in performance data, and make no assumptions about the number of clusters that should be present. Second, results from real world SAGD data reveal nine production responses to two different steam injection strategies, and include a few insights into well performance

TE D

The rest of the paper is organized as follows. Relevant background literature is discussed in Section 2, followed by a detailed description of the methodology in Section 3, and results evaluation in Section 4. Flexibility of the BBCD algorithm is then summarized in Section 5.

EP

2 BAYESIAN CLUSTERING

Bayesian clustering is a model-based, probabilistic approach to time series clustering. Model-based clustering

AC C

first transforms raw data into a new and more descriptive data space, and then clusters the data in that space [7]. Probabilistic approaches cluster on the likelihood that two series were generated by the same probability distributions [8, 9], as opposed to clustering on the similarity in shape of the series [10, 11]. Bayesian clustering provides a probabilistic framework to quantify the fit between a data set and a model, with the

added advantage of being able to incorporate prior information. The Bayesian approach assumes that there is an underlying structure to the data that can be captured in a model. Therefore, the task is to identify a probable set of generating processes, Model M, given the observed data set (D). From Bayes Theorem, the posterior probability of M given D is: |

=

|

(1)

4

ACCEPTED MANUSCRIPT

The prior probability of the data,

, is usually assumed to be constant since models are compared over the

same data. The prior probability of the model equally likely. Therefore, to maximize

|

is also usually constant because all models are assumed to be , it is sufficient to maximize

|

, the marginal likelihood. The

marginal likelihood is a measure of how likely the data are if M is true. It can be calculated with the statistically

RI PT

rigorous K2 metric [12] which works with multinomial data and can include Dirichlet prior probabilities. Ramoni et al. [13, 14] modified the K2 metric to work with discrete time sequenced data containing y possible states. Under the Markov assumption, a sequence X is represented as a transition frequency matrix, where rows are parent states seen at time t-1 (Xt-1), and columns are child states seen at time t (Xt). The marginal likelihood is then: =

,

,

,

,

(2)

SC

|

where f(D,M) is the likelihood of the data assuming it is partitioned into c clusters, given by: ∏

=

(3)

M AN U

,

and f(D,Xt-1,Xt,M) measures the likelihood of the data when, conditional on having c clusters, each sequence is assigned into one and only one cluster: ,

,

,

= ∏

In Equation 3 and 4, #

∏ "

∏"

!

!

(4)

is defined as the count of sequences in cluster k, that contain parent state i and child state

j, while $ represents the total count of sequences in cluster k. Prior distributions are encapsulated as hypothetical

cluster k.

TE D

sequences, each containing α transitions, where %

"

is the frequency with which hypothetical transition i → j occurs in

While Bayesian clustering groups time series, it is not sufficient to understand how these groups evolve over time.

3

EP

For that, biclustering is necessary. Biclustering simultaneously clusters sequences (rows) and timesteps (columns).

BAYESIAN BICLUSTERING BY DYNAMICS FOR SAGD DATA

AC C

This section demonstrates how the proposed Bayesian Biclustering by Dynamics algorithm works with SAGD

performance data. The BBCD algorithm starts with three inputs – a data set of observed steam injection and oil production volumes for each SAGD well (D), a matched data set of analytical steam and oil volumes used as prior information (D’), and a significance level parameter (sig) which controls the influence of prior information on the clustering process. Figure 2 shows a flowchart of the three general steps – data representation, posterior probability calculation and biclustering search. In data representation, the data sets are initialized into node structures used by the search heuristic, and node structures used to calculate the posterior probability score. Then, a posterior probability score is calculated for the starting node configuration model. Next, a biclustering search finds two nodes to join, and

5

ACCEPTED MANUSCRIPT

the score is re-calculated for the new configuration. A join is retained if the score improves. The algorithm proceeds

M AN U

SC

RI PT

with iterative improvements until there are no more potential joins that satisfy a join-qualifying criterion.

TE D

Figure 2: Flowchart of the BBCD algorithm.

3.1 Data Representation

In this pre-processing step, volumes of all wells are aligned by their first month on injection. Observed and

EP

analytically calculated steam and oil volumes for each well are then divided by the original bitumen in place (OBIP) and normalized over the entire data set. Next, the data are manually discretized into equal-frequency values known as

AC C

states. Manual discretization is optional but useful in this case because it leads to more insightful clusters. Definition 1. Steam (or oil) States: A steam (or oil) state represents a monthly steam injected (or oil produced) volume discretized into values such as low, medium, or high. Next, each steam and oil value combination is assigned a combined state (cstate). Combinations that produce

more oil relative to the amount of steam used, are more desirable and given a higher cstate value. Months without steam injection or oil production are assigned a cstate value of zero. Definition 2. Combined Steam-Oil States: A combined state (cstate) contains a tuple of steam and oil states. Given, a sequence of steam injection states stm = {stm1, stm2, …, stmi, …}, and the related sequence of oil production states oil =

6

ACCEPTED MANUSCRIPT

{oil1, oil2, …, oili, …}, their combined sequence of cstates is defined as x = {x1, x2, …, xi, …} where xi = cstatei = {stmi, oili}. The original data sets D and D’ containing observed and calculated fluid volumes, have now been converted into sets D and D’ respectively, with one time series of cstates representing each well. Table 1 lists an example of a lookup

RI PT

table of cstates (0 to 9), which assumes that all constituent steam and oil states are defined as one of three values (low, medium, or high). Next, each sequence of cstates is converted into a set of cstate transitions between consecutive time steps.

cstate 0 1 2 3 4 5 6 7 8 9

M AN U

steam oil either = 0 High Low Medium Low Low Low High Medium Medium Medium Low Medium High High Medium High Low High

SC

Table 1: An example of steam-oil cstates.

Definition 3. Combined state (cstate) transition: Given a pair of cstates xt-1 and xt at consecutive time steps, a

TE D

combined state transition occurs from xt-1 → xt ∀t > 1. xt-1 is referred to as the ‘parent’ cstate, xt as the ‘child’ cstate, and cstate transition xt-1 = i → xt = j is denoted as i → j.

Definition 4. Transition frequency matrix: Given a set of cstate transitions, a transition frequency matrix (N) is a & × & matrix that holds the occurrence count of each transition in the set. Here, & is the number of cstates and # " is

EP

the number of occurrences of transition i → j.

Transition probabilities are created under the Markov assumption, and represented in a transition probability

AC C

matrix. A cstate transition i → j obeys the Markov property if the probability of xt = j depends only on the probability of xt-1 = i. The probability of transition i → j can be defined as

"

= p ) = * | )

= + = # " / ∑" # " . Each

parent cstate in the matrix is treated as a separate probability distribution, and each transition probability matrix holds & probability distributions.

Definition 5. Transition probability matrix: Given a set of cstate transitions under the Markov assumption, a transition probability matrix (P) is a & × & matrix that holds the probabilities,

",

of each transition i → j in the set.

A well is described as the set of cstate transitions from one row in data set D. A stage, St-1, consists of the set of cstate transitions across two consecutive time-steps t-1 and t, of every well. A segment is a single cstate transition representing one well-stage combination. Transition frequency matrices are used to represent wells, stages and

7

ACCEPTED MANUSCRIPT

segments as shown in Figure 3. In this example, cstate transition 2 → 7 across the first two time steps of well W1 increments the count in row 2 and column 7 of ./0 , and so on. Transition 2 → 7 also represents a single segment counted in matrix ./100 . Stage S1 counts the cstate transitions across all the wells in a data set, but only if they occur

RI PT

between time-steps 1 and 2. Therefore, .10 holds the counts of transitions 2 → 7 of W1, 0 → 0 of W2, 1→ 1 of W3, and so on. Cstate transition frequencies may also be created for time series containing background knowledge on the problem. If # " is the frequency count of transition i → j in matrix ./0 representing well W1, then #′ " is the frequency count of transition i → j in matrix .′/0 representing the analytical values calculated for well W1. When #′ " is present, " changes.

TE D

M AN U

SC

the calculation of transition probability

Figure 3: Transition frequency matrices.

The last data transformation step involves setting up the node structures. Since production performance may

EP

differ over time (by stages) and location (by wells), the BBCD algorithm must cluster in both directions. Hence, two separate versions of the data set are created: DWN and DSN contain well-nodes and stage-nodes, respectively. Definition 6. Well-node: A structure that represents one or more wells in a data set. The k-th well-node in DWN,

AC C

denoted as 3. , consists of three & × & matrices – ./ , .′/ , and 4/ . ./ and .′/ are transition frequency matrices holding the observed and prior cstate transition counts of all the wells in 3. , and 4/ is the associated transition probability matrix. Initially, 3. represents a single well Wk for k ∈ {1, …, w}. Definition 7. Stage-node: A structure that represents one or more consecutive stages of a data set. The l-th stage-node in DSN, denoted as 5.6 , consists of three & × & matrices – .17 , .′17 , and 417 . .17 and .′17 are transition frequency matrices holding the observed and prior cstate counts of all the stages in 5.6 , and 417 is the associated transition probability matrix. Initially, 5.6 represents a single stage Sl for l ∈ {1, …, s}.

8

ACCEPTED MANUSCRIPT

Well-nodes and stage-nodes are used in the biclustering search, but they are unsuitable for calculating the posterior probability score because they overlap as shown in Figure 4. Therefore, segment-nodes are introduced to avoid overlapping clusters, meeting the requirement that each cstate transition belongs to one and only one cluster.

SC

RI PT

Segment-nodes are the clusters found by the proposed BBCD algorithm.

Figure 4: A depiction of nodes.

35.

6,

M AN U

Definition 8. Segment-node: Given a valid well-node 3. , and a valid stage-node 5.6 , the kl-th segment-node is a cluster containing the cstate transitions that exist in both 3. and 5.6 . Segment-node 35. 6 contains

two & × & transition frequency matrices, ./1 7 and .′/1 7 . Initially, ./1 7 and .′/1 7 contains one cstate transition each, for all values of k and l, where k ∈ {1, …, w} and l ∈ {1, …, s}.

By the end of the initialization step, data sets D and D’ are transformed into three new node structures: DWN, DSN

segment-nodes.

3.2 Biclustering Search

TE D

and DWSN. DWN is a 8 × 1 array of well-nodes, DSN is a : × 1 array of stage-nodes, and DWSN is a 8 × : array of

EP

Biclustering automatically finds the most probable row and column partitions in a data set. Given two well-nodes 3. and 3." where i < j, a node merge transfers the frequency counts from ./ to ./ and .′/ to .′/ . 3." is

AC C

then marked in-valid and ignored in future iterations, and transition probability matrix 4/ of the remaining node 3. is recalculated. The process is the same when merging two stage-nodes. A well-node can merge with any other wellnode, whereas a stage-node can only merge with its immediately previous or following stage-node. This requirement preserves the physical meaning of a stage. A key observation is that joining two well-nodes does not change the counts in any of the stage-nodes they overlap and vice versa. 3.2.1

Search Heuristic. A heuristic finds the next two nodes to join using the minimum symmetrized Kullback-Liebler distance (;< +:=)

[15] and the >?@:A?BC [16]. ;< +:= selects the two most similar nodes, while the >?@:A?BC estimates the quality of the

9

ACCEPTED MANUSCRIPT

proposed join. Equation 5 is the search heuristic formula ($+#A?$D?), where P1 and P2 are the transition probability matrices of two well-nodes or two stage-nodes: $+#A?$D? =

(5)

Symmetricized Kullback-Liebler Distance (KLDist). "

Assume that respectively. Then

and and

F"

are the probabilities of transition i → j in transition probability matrices P1 and P2

F

are probability distributions over all child states j, for the same parent state i. The

RI PT

3.2.2

E. ;< +:= 4 , 4F >?@:A?BC 4 , 4F

Kullback-Liebler distance between these two probability distributions is: ,

F

= ∑"

" ># I

0 J

K

(6)

The symmetric version of this score is calculated as ;< +:=

,

= LG>H+:=

F

,

M AN U

The average distance between P1 and P2 is then, ;< +:= 4 , 4F = L∑ ;< +:= 3.2.3

Log-Score.

,

SC

G>H+:=

F

F

+ G>H+:=

F

,

N/2.

N/&.

Information is lost when the cstate transitions of a well are summarized in a two-dimensional transition frequency matrix under the Markov assumption. More information is lost when two well-nodes or two stage-nodes join, further summarizing their combined cstate transitions. The >?@:A?BC provides a way to evaluate this loss. Given the transition probability matrix P of a well-node or stage-node, the >?@:A?BC of transition i → j is calculated as >?@:A?BC " = −ln

"

.

TE D

This score penalizes the node by assigning large values to observed transitions when P assigns them small probabilities. Cumulative score A: = ∑ " − ln

"! # "!

is the loss incurred by summarizing the constituent set of state transitions into

node1. Three cumulative scores are calculated, A: and A:F for two separate nodes (node1 and node2), and A:S for their potential joined node (node3). If A:S is greater than A: + A:F then more information is lost by joining node1 and node2

3.2.4

EP

than by leaving them separate. In Equation 5, >?@:A?BC = A:S – A: + A:F . Join Qualifying Criterion.

AC C

The posterior probability score quantifies the fit between a data set and clustering model, but it is a weak stopping

criterion for an agglomerative clustering algorithm. Here, a potential join is disqualified if it results in a mincombo score of double or greater than the previous score. Discarding a potential join does not stop the algorithm, unless there are no more joins that meet the qualifying criterion.

3.3 Posterior Probability Calculation Segment-nodes are used to calculate the posterior probability score. When two well-nodes merge, at most 2s segment-nodes are also merged because one well-node encompasses s segment-nodes. Similarly, when two stagenodes merge at most 2w segment-nodes are also merged. A potential well-node join, and the corresponding 2s

10

ACCEPTED MANUSCRIPT

segment-node joins, are retained if they cause the posterior probability score to improve. Equations 7 and 8 show how this score is modified to work with segment-nodes:

,

,

∏X ∏W6 , V = ∏X ∏W6

7

7

7

∏

(7)

7

7

7

∏"

7

7

7

!

!

(8)

RI PT

,V =

The number of clusters is now calculated in terms of segment-nodes, where ws is the maximum number of segment-nodes possible, #

6"

is the count of transition i → j in segment-node kl, %

6"

is the count of hypothetical (prior)

transitions in the same segment-node cluster, and $ 6 (= #G> ) is the total number of transitions in segment-node kl. Prior Information.

SC

3.3.1

As mentioned, reservoir characteristics are incorporated into the clustering process through prior information.

M AN U

Gelman [17] states two considerations when creating a prior distribution: 1) information going into the prior distribution, and 2) sensitivity of the resulting posterior distribution. In this paper, prior information consists of precalculated analytical steam and oil volumes [5] pre-processed in the same way as observed data. These volumes are assumed to belong to the same family of multinomial probability distributions as the observed fluid volumes. Hence, they can be used to populate α in Equations 7 and 8. Sensitivity is evaluated using a significance level parameter (sig), which controls the extent to which prior information can influence results.

Hyper-parameter α represents the

TE D

summation of transition frequency counts within a prior information series, multiplied by sig. To ensure that Equation 8 is always defined, condition % " > 0 must hold. However, calculating % as a summation of % " over all prior cstates j will artificially inflate this hyper-parameter when one or more of the prior frequency counts it represents, are zero. The compounded effect of inflated hyper-parameter summations is that one large group

EP

dominates the clustering process, which in turn biases the algorithm towards the hypothesis that a single underlying process generates the entire data set. To address this issue, if #′ " represents the frequency count of prior transition i →

AC C

j in matrix .′ , then % " = #′ " + 1 :+@ is the corresponding hyper-parameter, adjusted for prior information significance. Similarly, % = [ ∑" #′ " ! + 1\:+@, and % = [ ∑ " #′ " ! + 1\:+@. 3.3.2

Transition Probabilities with Prior Information.

When prior information is present, the probability of transition i → j is calculated from both prior and observed

transition frequency matrices under two conditions. First, each parent cstate is a separate probability distribution and therefore should sum to unity. If a parent cstate has no children, all child cstates should be equally probable with a value greater than zero.

Second, the weighting of prior transition probabilities should be in proportion to the

significance level parameter (sig). Equation 9 is the adjusted calculation of

11

"

satisfying both these conditions:

ACCEPTED MANUSCRIPT

]^_ = ∑

`^_ a^_

_

(9)

`^_! ∑_ a^_ !

3.4 Complexity Given that the number of wells is usually larger than the number of stages (w > s), complexity of the BBCD

RI PT

algorithm is dominated by the search to find the two most similar well-nodes in every iteration, O(fw2), where w is the number of wells, and f is the number of algorithm iterations. However, after the first iteration the algorithm effectively runs in linear time because only a small subset of clusters need recalculation in each subsequent iteration.

SC

4 EXPERIMENTAL RESULTS

The SAGD data set used in this study consists of 19-104 months of observed and analytically calculated steam

M AN U

injection and oil production volumes for 328 wells in Alberta, Canada [5, 6]. Correct cluster assignments are not known for this data set; hence results are assessed by domain knowledge and by comparison to manual clustering as depicted in Figure 5. In manual clustering, DTW [10] similarity is calculated between wells and stages, and clusters are manually assigned. A transition probability matrix (Pi) is created for each cluster found. Then the agglomerative clusters found by the BBCD algorithm are assessed against these matrices. A cluster is considered correctly assigned if the P matrix it matches most closely is the one at the same row and column location.

Results show that the BBCD

TE D

algorithm is 88% accurate against manual clustering. Extensive testing of the BBCD algorithm on synthetic data [18] shows that the algorithm is also highly robust to noise. A prior significance level (sig) of unity was found to stabilize accuracy in the presence of noise and is therefore the value used henceforth. The next section describes cluster

AC C

EP

evaluation using domain knowledge.

Figure 5: Flowchart of evaluation process.

12

ACCEPTED MANUSCRIPT

4.1 SAGD Biclustering Results When BBCD is run on the SAGD data set, the algorithm returns a Bayesian network model consisting of 27 clusters in a 9x3 matrix of segment-nodes. Stage-group (or stage) A is the performance over the first two months after

RI PT

start of steam injection, stage B is months 3 to 24, and stage C is month 25 onwards. Well-sets (or w-sets) are indicated with the numbers 1 to 9. A single cluster 1A is defined as stage A in w-set 1, resulting in 27 clusters (shown as

M AN U

SC

rectangles) in Figure 6.

Figure 6: Depiction of SAGD cluster model. Stage A = first two months after start of steam injection, Stage B = months 3 to 24, and Stage C = month 25+.

Evidence: Analysis: Evidence: Analysis:

TE D

Evidence: Analysis:

Stage A is pre-production, stage B is pre-coalescence oil recovery, and stage C is postcoalescence oil recovery. Instead of early, mid- and late-stage production, there is only pre- and post coalescence production. Stages A transition frequency matrices contain many cstate 0 values. Since steam injection has already started, a cstate of 0 indicates that oil production was not occurring. Stage B has almost linear growth over months 3 to 24 (see Figure 7). Before adjacent steam chambers merge (that is, pre-coalescence), increase in steam injection results in increase in oil recovery. Stage C shows non-linear growth after 24 months (see Figure 7). After coalescence, steam injected in one well may travel to a different location having little effect on oil recovery in the first well.

EP

Observation:

AC C

Figure 7 shows the average monthly performance of each well-set, with discretization labels overlaid. W-sets 1 and 2 reached high steam injection, w-sets 3 and 4 reached medium steam injection, and the rest range between medium-low and low injection.

13

SC

RI PT

ACCEPTED MANUSCRIPT

M AN U

Figure 7: Average steam/OBIP versus average oil/OBIP in the nine well-sets.

Examining the progression of cstate transitions in all 27 clusters, two frequently occurring steam injection sequences emerge: 4 → 7 and 3→ 6. These sequences correspond to an operational decision to inject steam at high and low levels respectively, allowing the corresponding amount of oil recovered to improve over time as the steam chamber grows. Medium steam injection is usually achieved through a sequence such as 3→ 6→ 5 suggesting that operators

AC C

Analysis:

There are only two intentional steam injection strategies: high and ramp-up. Only high injection achieves the minimum SOR, and then only if GOR (gas-oil ratio) is managed. There are only two frequently occurring steam injection sequences: 4 → 7 and 3→ 6. These are sequences in >30% of the wells in a w-set. Frequent sequences in w-sets 1 and 2 contain only cstates 4 and 7. Most of the other w-sets contain the frequent pattern 3→ 6. If the well-sets on high steam injection only have frequent patterns containing cstates 4 and 7, then they are consistently on high injection. Wells ramping up steam injection have patterns containing cstates 3 to 8 in random order. This indicates multiple efforts to increase or ramp up injection accompanied by an inability or unwillingness to sustain higher levels of steam injection. The legend in Figure 7 shows that SOR and GOR are higher for w-set 2, indicating that performance of w-set 2 was not as good as w-set 1 even though both used high steam injection. The high gas production for wells in w-set 2 may be hindering oil production because gas is always produced in preference to oil. Lower oil production would also raise SOR.

EP

Injection Strategies: Evidence:

TE D

wait to see what a reservoir can produce before ramping-up steam injection levels.

Evidence: Analysis:

In summary, high steam injection is risky judging by what happened to wells in w-set 2. However, it is also clear that the linear relationship between steam injection and oil production is time-limited to stage B. Slowly ramping up steam injection may mean forfeiting the chance to achieve minimal SOR.

14

ACCEPTED MANUSCRIPT

5 CONCLUSIONS A new agglomerative biclustering algorithm, Bayesian Biclustering by Dynamics (BBCD), is proposed and evaluated. It describes each cluster with a set of probability distributions that characterize the underlying process likely

RI PT

to have generated that cluster. Inclusion of prior information in the clustering process provides a structured way to bring in relevant background knowledge. Clustering accuracy on a real SAGD data set is comparable to DTW even though the BBCD algorithm uses only one-step dependencies. methodology are informative and insightful.

Furthermore, clusters found using the BBCD

Until now state transitions have represented combined steam-oil states (cstates) because this was the relationship

SC

of interest. However, the BBCD methodology is flexible enough to accommodate states that represent any combination of variables (univariate), or multiple simultaneous variables (multivariate) that may be of interest. If there is no need

M AN U

for informative priors, uniform priors can be substituted instead. One-step Markov dependencies can be removed when examining the relationship between different variables in the same month or extended to multi-step dependencies as needed.

Future work will investigate change in oil produced relative to change in steam injected to uncover time periods when steam injection is most critical. Additional SAGD operational variables, such as produced gas, will be included. More insights will be drawn from clustering results and bigger data sets with known outcomes will be used for method

TE D

evaluation, if they can be found or created.

ACKNOWLEDGMENTS

EP

This work is funded by the Natural Sciences and Engineering Research Council (NSERC) of Canada, and the University of Calgary First Research Excellence Fund Program in Unconventional Energy Research. The authors gratefully acknowledge Divestco for providing the SAGD fluid volumes, and Bingjie Wei for her assistance with data

AC C

collection and pre-processing.

COMPUTER CODE AVAILABILITY Name of code: BBCD.zip Developer: Helen Pinto (listed author) Year first available: 2018 Hardware required: Core i7 PC, 12GB RAM Software required: JDK 1.8 and Microsoft Excel Program Language: Java Program size: < 100KB How to access the source code: See readme.txt file included. https://data.mendeley.com/datasets/g8x3wybrjy/draft?a=a6108148-937e-4156-a284-367908dcdc93

15

ACCEPTED MANUSCRIPT

REFERENCES [1]

Butler, R. M. (1994). Steam-Assisted Gravity Drainage: Concept, Development, Performance and Future. Petroleum Society of Canada. doi:10.2118/94-02-05. Gates, I.D., 2011, Basic Reservoir Engineering. Kendall Hunt Publishing, Dubuque, IA.

[3]

Chan, R. (2013). Application of Field Performance Data in Developing Simple Analytical Models to Predict the

RI PT

[2]

Performance of Steam Assisted Gravity Drainage. (Published master’s thesis). University of Calgary, Calgary, Alberta, Canada.

Reis, J.C. (1992). A steam-assisted gravity drainage model for tar sands: linear geometry. Journal of Canadian Petroleum Technology, 31(10): 14-20. doi:10.2118/92-10-01

[5]

Pinto, H. (2018). Analytical steam injection and oil production volume calculations.

M AN U

http://ucalgary.ca/wangx/people/helen_pinto [6]

SC

[4]

Divestco, 2016, “EnerGISite,” Divestco, Calgary, Canada, accessed June 3, 2016, https://energisite.divestco.com/idcWellLogs2010/Home.aspx

[7]

Liao, T.W. (2005). Clustering of time series data – a survey. Pattern Recognition, 38:1857-1874. doi:10.1016/j.patcog.2005.01.025

[8]

Amar, D., Yekutieli, D., Maron-Katz, A., Hendler, T., Shamir, R. (2015). A hierarchical Bayesian model for flexible

[9]

TE D

module discovery in three-way time-series data. Bioinformatics, 31(12): i17-i26. doi: 10.1093/bioinformatics/btv228. Gu, J., Liu, J. (2008). Bayesian biclustering of gene expression data. BMC Genomics, 9(Suppl. 1): S4. doi:10.1186/14712164-9-S1-S4.

[10] Berndt, D., Clifford, J. (1994). Using dynamic time warping to find patterns in time series. AAAI Workshop on

EP

Knowledge Discovery in Databases, pp. 229-248.

[11] Jung. Y., Park, H., Du, D-Z., Drake, B. (2003). A Decision Criterion for the Optimal Number of Clusters in Hierarchical

AC C

Clustering. Journal of Global Optimization, 25(1):91-111. doi:10.1023/A:1021394316112 [12] Cooper, G., Herskovits, E. (1992). A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning, 9(4): p309-347. doi: 10.1023/A:1022649401552

[13] Ramoni, M., Sebastiani, P., Cohen, P. (2002). Bayesian Clustering by Dynamics. Machine Learning, 47(1): p.91-121. doi: 10.1023/A:1013635829250.

[14] Ramoni, M., Sebastiani, P., Cohen, P. (2000). Multivariate Clustering by Dynamics. In Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence. AAAI Press 633-638.

16

ACCEPTED MANUSCRIPT

[15] Kullback, S.; Leibler, R.A. (1951). On information and sufficiency. Annals of Mathematical Statistics. 22(1): 79–86. doi:10.1214/aoms/1177729694. [16] Gneiting, T., Raftery, A. (2007). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association,102(477): 359-378. doi:10.1198/016214506000001437.

RI PT

[17] Gelman, A. (2002). Prior distribution. Encyclopedia of Environmetrics. 3:1634-1637. John Wiley & Sons, Chichester, UK. ISBN: 0471-899976.

[18] Pinto, H. (2018). Bayesian Biclustering by Dynamics - Synthetic Data Testing.pdf

AC C

EP

TE D

M AN U

SC

http://ucalgary.ca/wangx/people/helen_pinto

17

Bayesian Biclustering by dynamics: A clustering algorithm for SAGD time series data

Bayesian Biclustering by dynamics: A clustering algorithm for SAGD time series data

Recommend Documents