Virtual multiphase flow metering using diverse neural network ensemble and adaptive simulated annealing

Virtual multiphase flow metering using diverse neural network ensemble and adaptive simulated annealing

Expert Systems With Applications 93 (2018) 72–85 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.el...

2MB Sizes 6 Downloads 42 Views

Expert Systems With Applications 93 (2018) 72–85

Contents lists available at ScienceDirect

Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa

Virtual multiphase flow metering using diverse neural network ensemble and adaptive simulated annealing Tareq Aziz AL-Qutami a,∗, Rosdiazli Ibrahim a, Idris Ismail a, Mohd Azmin Ishak b a b

Department of Electrical & Electronic Engineering, Universiti Teknologi PETRONAS, 32610 Seri Iskandar, Perak, Malaysia Head of Subsea Control, Project Delivery & Technology, PETRONAS, Malaysia

a r t i c l e

i n f o

Article history: Received 7 March 2017 Revised 4 October 2017 Accepted 5 October 2017 Available online 6 October 2017 Keywords: Neural network Ensemble method Simulated annealing Multiphase flow Virtual flow meter Soft sensor

a b s t r a c t Real-time production monitoring in oil and gas industry has become very significant particularly as fields become economically marginal and reservoirs deplete. Virtual flow meters (VFMs) are intelligent systems that infer multiphase flow rates from ancillary measurements and are attractive and cost-effective solutions to meet monitoring demands, reduce operational costs, and improve oil recovery efficiency. Current VFMs are very challenging to develop and very expensive to maintain, most of which were developed for wells with dedicated physical meters where there exists an abundance of well test data. This study proposes a VFM system based on ensemble learning for fields with common metering infrastructure where data generated is very limited. The proposed method generates diverse neural network (NN) learners by manipulating training data, NN architecture and learning trajectory. Adaptive simulated annealing optimization is proposed to select the best subset of learners and the optimal combining strategy. The proposed method was evaluated using actual well test data and managed to achieve a remarkable performance with average errors of 4.7% and 2.4% for liquid and gas flow rates respectively. The accuracy of the developed VFM was also analyzed using cumulative deviation plot where the predictions are within a maximum deviation of ± 15%. Furthermore, the proposed ensemble method was compared to standard bagging and stacking and remarkable improvements have been observed in both accuracy and ensemble size. The proposed VFM is expected to be easier to develop and maintain than model-driven VFMs since only well test samples are required to tune the model. It is hoped that the developed VFM can augment and backup physical meters, improve data reconciliation, and assist in reservoir management and flow assurance ultimately leading to a more efficient oil recovery and less operating and maintenance costs. © 2017 Elsevier Ltd. All rights reserved.

1. Introduction Multiphase flow is a simultaneous stream of more than one component with different physical and chemical properties such as gas, liquid, and solid (MPMS, 2013).A two-phase flow of gas and liquid is very common in oil and gas production fields, and measuring individual flow rates is essential for well surveillance, flow assurance, reservoir management, and production monitoring and optimization (Thorn, Johansen, & Hjertaker, 2012). It is even more significant as fields become economically marginal (Falcone, Hewitt, & Alimonti, 2009). Current practice to measure multiphase flow rates is using multiphase flow meters (MPFMs). MPFMs combine several measurement principles such as Gama ray



Corresponding Author. E-mail addresses: [email protected], [email protected] (T.A. AL-Qutami), [email protected] (R. Ibrahim), [email protected] (I. Ismail), [email protected] (M.A. Ishak). https://doi.org/10.1016/j.eswa.2017.10.014 0957-4174/© 2017 Elsevier Ltd. All rights reserved.

spectroscopy, capacitance tomography, microwave, and ultrasound to infer phase flow rates (Falcone et al., 2009; Thorn et al., 2012). However, MPFMs are still economically infeasible to be installed for individual wells and suffer hardware failure particularly in subsea applications (Thorn et al., 2012; Varyan, Haug, Fonnes et al., 2015). In addition, MPFMs suffer high uncertainty and error propagation which necessitates frequent costly calibration (Gryzlov, 2011; MPMS, 2013). This calibration is usually challenging and sometimes impossible in long tie-back subsea networks. Furthermore, many production fields are still using common metering and flow assurance facilities that rotate and sample production wells one at a time periodically using either a test separator or an MPFM. This monitoring technique is inadequate particularly when wells are mature, rapid changes in water cut (WC) and gas volume fraction (GVF), resulting in late correction actions. Those limitations triggered the development of virtual flow meter (VFM), a software-based computational model that estimates real-time multiphase flow rates by exploiting existing measurements. It potentially combines cheap hardware with acceptable

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85

73

Table 1 Summary of related work for data-driven VFMs. Method

Remarks

Paper

LR

- Only 5 well tests are used to develop the model. - Requires GOR, Oil-water ratio (OWR) and API gravity as inputs. - Uses extended Venutri (extra hardware). - Lab-scale with ≈ 10% error. - Uses single point P&T as inputs (under-determined system). - Uses DP signals as inputs from Lab-scale vertical pipe. - High prediction error, ≈ 20%. - NN structure manually selected.

(Zangl et al., 2014)

PCR

SVM

Fuzzy Logic

PCA+ NN

Fig. 1. Typical instrumentation available in a production Well.

NN

performance and can be easily implemented on existing hardware platforms with minimum additional costs (Bailey, Shirzadi, Ziegel et al., 2013; Falcone et al., 2009). VFM can augment MPFM to reduce measurement uncertainty, act as a backup when MPFM is faulty (redundancy), and provide data reconciliation and increased reliability (Amin et al., 2015; Babelli, 2002). This ultimately improves reservoir recovery and reduces capital and operational expenditures (CAPEX and OPEX) (Amin et al., 2015; Varyan et al., 2016). Typical instrumentation of an oil and gas production well is illustrated in Fig. 1. It consists of downhole pressure and temperature measurements (P1 &T1 ), wellhead pressure and temperature measurements (P2 &T2 ), choke valve opening percentage (CV%), and pressure measurements upstream and downstream of the choke valve (P3 &P4 ). These parameters are correlated to overall production rate and pressure losses across the flow-line, hence they can be used to infer flow rates and are deemed suitable inputs to VFM systems. Traditional VFM approaches are based on empirical correlations and mechanistic modeling. Empirical correlations are extracted from experimental data, such as empirical Choke models (Moghaddasi, Lotfi, & Moghaddasi, 2015) that has the following generic formula:

Q=

P Sn cRm

(1)

where P is the upstream pressure, Q is the production rate, R is the gas-liquid ratio (GOR), S is the choke size, c, m, and n are empirical constants. However, these correlations require GOR which itself is difficult to measure. GOR is usually obtained from lab sampling, hence such correlations would assume constant GOR until next lab results. Moreover, these correlations are limited to certain operational conditions and fluid properties. On the other hand, the mechanistic (model-based) approach is based on the physical phenomena and multiphase flow dynamics. It highly relies on fluid types and production regimes and is sensitive to changing operating conditions such as GOR and WC (Amin et al., 2015). Deploying and maintaining mechanistic VFMs is very challenging and costly since many parameters are required such as well profile, heat transfer, pipe roughness, productivity index (PI), and fluid composition (Haldipur, Metcalf et al., 2008; Varyan et al., 2016). Furthermore, many mechanistic models are computationally expensive due to the use of multivariate nonlinear solvers to find unique solutions (Bello, Ade-Jacob, Yuan et al., 2014; Haldipur et al., 2008; Varyan et al., 2015). Since the objective of VFMs is to utilize current knowledge from ancillary measurements to infer multiphase flow rates, intelligent

(Bello et al., 2014)

(Xu et al., 2011)

(Ahmadi et al., 2013)

(Shaban & Tavoularis, 2014)

(Ahmadi et al., 2013; AL-Qutami et al., 2017; Berneti & Shahbazian, 2011; Hasanvand & Berneti, 2015; Zangl et al., 2014)

systems using soft computing techniques seem potential candidates to achieve this objective. Such data-driven expert systems are easier to develop and maintain than mechanistic models since they don’t require an in-depth knowledge of the underlying physics to infer flow rates (Falcone et al., 2009; Stone et al., 2007). A VFM expert system would be able to establish this inference from the data patterns which is the focus of this article, data-driven VFM systems. Such VFM system can be deployed by adding several computing unit to the field IOT infrastructure. These compute units would retrieve measurements from sensors and relay the flow rate estimations to the supervisory control and data acquisition system. Several data-driven techniques have been proposed to develop VFM systems such as least squares linear regression (LR) to estimate water and liquid flow rates (Zangl, Hermann, Schweiger et al., 2014), principal component regression (PCR) to estimate oil and gas flow rates in offshore wells (Bello et al., 2014), support vector machine (SVM) combined with venturi meter (Xu, Zhou, Li, & Tang, 2011), and the most popular technique is neural networks (NN) (Ahmadi, Ebadi, Shokrollahi, & Majidi, 2013; AL-Qutami, Ibrahim, Ismail, & Ishak, 2017; Berneti & Shahbazian, 2011; Hasanvand & Berneti, 2015; Zangl et al., 2014). A summary of these studies is presented in Table 1. Some of these studies used experimental setups to collect data (Shaban & Tavoularis, 2014; Xu et al., 2011) or used data representing a short production period, three months (Hasanvand & Berneti, 2015) and thirty hours (Zangl et al., 2014), to develop VFM models. These studies may not capture complex multiphase behaviors or represent production trends accurately, especially in new wells where production is kept almost constant. Moreover, some studies only focused on predicting one component flow rate (oil) in the multiphase flow and used temperature and pressure measurements at one or more points along the flow-line without accounting for choke opening (Ahmadi et al., 2013; Berneti & Shahbazian, 2011; Hasanvand & Berneti, 2015). This may limit the longrun performance of VFM and may impose frequent calibrations due to reservoir and production changes over time particularly when downhole pressure is not taken into account. Besides, considering P&T at a single point only results in under-determined and very sensitive system (Ahmadi et al., 2013; Hasanvand & Berneti, 2015). Aside from aforementioned limitations, current VFMs in literature are developed using data collected from dedicated meters.

74

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85

selection and tuning of parameters (Guan, Yuan, Lee, Najeebullah, & Rasel, 2014; Ma, 2012). Bias-variance-covariance decomposition is often used to theoretically justify the use of ensemble methods and is expressed as follows:





1 1 MSE = E {[ f¯ − t]2 } = bias2 + var + 1 − covar + noise M M

Fig. 2. Neural network Ensemble learning structure.

i.e. the flow rates are sampled several times a day, could be every minute. However, many fields have not had the luxury to install dedicated MPFMs or test separators for each well, and they rather use common metering facilities where production rates are measured successively. Consequently, measurements from these wells hardly have a sampling rate of once a week and sometimes once a month. A question may arise, whether collecting measurements for several years, say 10 years, would solve the limited data issue in fields with common meters. The answer is related to the reservoir and operational characteristics which change over time, hence old data may not be representative anymore. This is specifically true for small or old reservoirs which in few years can go from primary recovery to secondary or even tertiary recovery techniques. This would completely alter production mechanics and introduce new variables into the equation such as gas lift injection rate. This study is hence towards the development of VFMs for fields with common metering facilities where not many samples can be acquired. From current VFM literature, NN has proven more effective relative to other traditional techniques. Nonetheless, it still has shortcomings of overfitting, local minima, and sensitivity to data variations and parameter selection (unstable algorithm) (Hunter, Yu, Pukish III, Kolbusz, & Wilamowski, 2012). Besides the poor performance when the number of samples is small. A novel way to overcome these shortcomings is ensemble learning. This study proposes a method to generate architecturally diverse NN learners, then select the best subset from these learners and the best strategy to combine them. The selection process is achieved using adaptive simulated annealing with a proposed annealing function to ensure optimal performance and efficient optimization process. The proposed NN ensemble method is used to develop a VFM model using actual well test data from 4 oil and gas wells. The performance of the VFM is reported and is also compared to two famous ensemble techniques, namely bagging and stacking. The rest of the paper is arranged as follows. In Section 2, the concept of ensemble learning is introduced and common ensemble techniques and their limitations are discussed. In Section 3, the proposed diverse NN ensemble development is presented in details. In Section 4, the performance of the proposed method is analyzed and compared to bagging and stacking. Finally, the conclusion and future work are presented in Section 6. 2. Ensemble learning Ensemble Learning is the process of developing several learners (also called models or predictors) then combining them to hopefully achieve a better predictive performance. Fig. 2 shows the architecture of an NN ensemble where predictions from n NNs are aggregated using a combining strategy to produce the final ensemble prediction. Even though the original aim of ensemble learning is to reduce variance, hence improving overall performance, ensemble methods can also be used to escape the process of model

(2)

where E is mathematical expectation, M is ensemble size, t is the target (output), and f¯ is the ensemble prediction. The error is composed of the bias term, variance term (variability among base models), covariance term (pairwise variance of different base models), and noise which is irreducible term in the error. Equations to calculate the three components are:

bias =

1  (E[ fi ] − t ) M

(3)

2 1   E fi − E[ fi ] M

(4)

i

var =

i

covar =

    1 E fi − E[ fi ] f j − E[ f j ] M (M − 1 ) i

(5)

j=i

where fi and fj are the responses of base learners i and j respectively. Covariance term (covar) can be negative hence it may reduce the ensemble expected error, keeping (bias) and (var) terms unchanged. Moreover, as the number of learners in the ensemble increases, the variance contribution to ensemble error diminishes, while the prominence of covariance grows. Bias-variancecovariance decomposition illustrates that if we can design lowcorrelated base learners, an improvement in performance is anticipated. Another theoretical justification is the ambiguity decomposition (Krogh & Vedelsby, 1995) which proved that the ensemble squared error is guaranteed to be less than or equal to the average squared error of the individual learners constituting the ensemble. Ambiguity decomposition shows that the ensemble error is expected to decrease as the ambiguity term (diversity among learners) increases. Hence the key to improved performance is the independence of base learners in producing their errors. The more the base learners disagree on their predictions, the better the generalization of the ensemble (Zhou, 2012). In order to establish low-correlation among base learners in the ensemble, diversity has to be promoted. This is achieved through parameter diversity (e.g. the number of neurons and initial weights in NN), data diversity (e.g. using bootstrap to create different training data for each learner), and structural diversity (e.g. architecture of NN and type of transfer function) (Ren, Zhang, & Suganthan, 2016). Diversity can also be achieved by modifying the learning trajectory of learners. Several approaches have been proposed to alter traversal of search space leading to different hypotheses. The most common approach is negative correlation learning (NCL) introduced by Liu and Yao (1999) which minimizes correlations within the ensemble by introducing a penalty term in the cost function. Evolutionary and metaheuristic algorithms have also gained popularity and are used to maintain both diversity and accuracy by selecting an optimal subset of models from a pool of candidates (Dai & Han, 2016; Liu, Dai, & Liu, 2014). 2.1. Ensemble techniques Bagging (bootstrap aggregation) proposed by Breiman (1996) implicitly promotes diversity by random resampling of the original training set with replacement, several instances of the training set are created and fed to induction algorithm generating diverse models, then all base predictions are

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85

75

averaged to get the final ensemble prediction. Bagging algorithm is shown in Algorithm 1. Similarly, random feature sampling can be Algorithm 1 Bagging algorithm. Require: learning algorithm A, number of learners N, original training set D Construction Stage: t←1 repeat Di ← boot st rap(D ) Mi ← train(A, Di ) t ←t +1 until t = N Output Aggregation: F (x ) ← N1 N i=1 Mi (x ) used to construct base learners. For example, random forests and extremely randomized trees use both data and feature sampling with decision tree as the learning algorithm (Biau, 2012). These methods aim to reduce variance to improve overall performance. In a similar approach, boosting methods such as AdaBoost and Gradient Boosting (Hastie, Tibshirani, & Friedman, 2011) build base learners sequentially by training each new learner based on the performance of the previously trained learners and emphasizing on data instances that were incorrectly predicted by previous learners. Boosting is highly dependent on the dataset, it can be more accurate than bagging or less accurate than base learners due to the sensitivity of boosting to noise (Zhou, 2012), hence the number of learning rounds and the complexity of base learners have to be properly chosen to avoid overfitting. Another ensemble technique is stacking (or stacked generalization) introduced by Wolpert (1992). It uses an induction algorithm to train base learners (level-0 learners) then passes the predictions of these level-0 learners as inputs to a level-1 learner (metalearner) which is trained using the same or another induction algorithm to produce the final ensemble prediction. Unlike boosting and bagging, stacking does not usually sample training data, tends to use fewer learners, and can be easily used to create a heterogeneous ensemble of different induction algorithms. Stacked NN and stacked SVM are the most common algorithms in soft computing applications (Ren et al., 2016). Some issues related to ensemble development are how to generate ensemble learners while maintaining diversity, evaluation and selection of base learners, and selection of combining strategy which significantly influences ensemble performance. The proposed method in this paper extends the bagging approach, which manipulates data to generate diversity, to also manipulate the architecture of base learners to get further diversity. Further extensions are the use of different training algorithms in order to encourage different learning trajectories of base learners. The accuracy of base learners is ensured using early stopping and regularization approaches. The proposed method also considers several combining strategies (including a meta-learner similar to stacking). It also uses metaheuristic algorithm to select optimal combining strategy, in contrast to most ensemble methods which consider only a single combining strategy. The metaheuristic algorithm is also used to optimize the ensemble by selecting the best subset of learners resulting in a diverse and accurate ensemble. 3. Proposed ensemble methodology Neural network is a parallel structure capable of inputoutput mapping and is considered a universal approximator (Haykin, Haykin, Haykin, & Haykin, 2009). NN is commonly

Fig. 3. Proposed ensemble method development procedure.

used in soft sensor development and function approximation (Kadlec, Gabrys, & Strandt, 2009). However, it is an unstable algorithm (i.e. data perturbation affects performance), prone to overfitting, has large tuning parameters, and can be trapped in local minima (Hunter et al., 2012). NN ensemble techniques have been introduced to overcome such limitations, improve generalization capabilities, and tackle small sample size problem which is a primary issue in VFM application. The proposed ensemble in this study uses the general concept of Overproduce-and-choose (Coelho & Von Zuben, 2006) to ensure diverse and accurate base learners. The proposed ensemble method is illustrated in Fig. 3. First, a set of diverse candidate learners, feed-forward neural networks in this study, are generated in the “Overproduce” step. In the “Choose” step, the best subset of learners and a suitable combining strategy are selected using adaptive simulated annealing combinatorial optimization. This method aims to increase the likelihood of obtaining both diverse and accurate base learners with lower ensemble size (pruned ensemble). 3.1. Learner generation The first step after data collection is preprocessing of raw measurements to remove any outliers or samples containing missing values and to normalize all measurements. After that, the dataset is divided into training/validation/testing sets using K-fold crossvalidation to produce K ensemble models. The results reported in this article represents the average performance of these K models unless otherwise stated. This approach, though more expensive, can reduce data dependency and provide a more accurate and stable performance analysis than holdout validation. Bootstrapped sets are generated from the training data in order to introduce data diversity within the ensemble learners, where each base learner is trained using its respective Bootstrapped training set. To further increase diversity amongst learners, parameter and structural diversities are introduced. The number of neurons, hidden lay-

76

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85

Table 2 Diversity sources and their bounds and options. Diversity source

Minimum

Maximum

No. of neurons No. of hidden layers λ parameter

1 1 0

25 2 0.2

Training algorithm

12312-

Stopping criteria

ence between the desired output yt and current network output yˆt . L2 component can be added to the performance function and is mean squared weights of NN as follows:

MSW =

M T 1  2 em,t 2 m=1 t=1

(7)

j=1

Levenberg Marquardt Scaled Conjugate Gradient Bayesian Regulation Early stopping Maximum iterations

Adding MSW to the objective function yields regularized NN performance function Ereg :

Ereg = λ ∗ MSW + (1 − λ ) ∗ E

ers, training algorithm, stopping criteria and regularization parameter (λ) are drawn randomly from normal probability distributions N(μ, σ 2 ). Each diversity source is bounded by a maximum value and a minimum value or a set of options as shown in Table 2. NN weights are initialized using Nguyen-Widrow layer initialization (Nguyen & Widrow, 1990) in order to ensure even distribution across input space. The proposed random selection of parameters for each learner, besides introducing more diversity, alleviates learner parameter selection process, given a proper pruning of low-performing learners. This is advantageous over many ensemble methods that choose parameters manually (e.g. bagging, Boosting or other methods based on data diversity) or generate several models with different parameters then pick parameters of the best model (Soares, Antunes, & Araújo, 2013) leading to slower model generation process. Proposed learner generation process combines data diversity, parameter and architectural diversity and learning trajectory diversity. Furthermore, overfitting of individual NN learners is averted through regularization and early stopping approaches. These approaches not only ensure generalization within individual learners but also speed up the generation process since NN training process will usually stop before reaching the maximum number of iterations. Therefore, the proposed NN generation process is efficient and ensures a higher degree of diversity. The proposed diversity aspect by altering the trajectory of search space traversal is accomplished using three NN training algorithms: Levenberg Marquardt (Hagan, Demuth, Beale, & De Jesús, 1996), scaled conjugate gradient (Møller, 1993) and Bayesian regulation (Burden & Winkler, 2009). The problem addressed in this article involves a small dataset due to the common metering infrastructure, hence other training algorithms such as stochastic gradient descent and mini-batch gradient descent algorithms were not considered since they target large training datasets (Bottou, 2012). The Levenberg Marquardt algorithm is often the fastest and most widely used backpropagation algorithm to train neural networks. In this study, NN learners trained by Levenberg Marquardt use training process regularization, early stopping (Giles, 2001), to terminate the training and avoid overfitting. This is done by observing validation error while training the network, whenever the validation error starts to increase, the training is terminated. In this study, the maximum number of training epochs is set to 500 and in every epoch, the validation error is checked. When the error increases over 6 times the training is terminated. On the other hand, regularization implemented with scaled conjugate gradient training algorithm is L2 regularization. If the objective function for training unregulated-NN is:

E=

k 1 2 wj 2

(6)

em,t = ym,t − yˆm,t where M is the number of training samples, T is the number of outputs, em, t is the training error for sample m and is the differ-

(8)

The additional regularization term (λ) is set such that NN has small weights and smooth response, hence unlikely to overfit. However, deciding the optimum value of λ is challenging. Therefore, it is proposed to generate networks with different λ values drawn from normal distribution. This means λ can be considered as a source of diversity. λ range is set to [0,0.2] rather than [0,1] to avoid underfitting. In Levenberg Marquardt (early stopping), there is no regularization (λ = 0). Similarly, Bayesian regulation uses the same L2 regularization but the value of λ is chosen automatically by assuming NN weights to be random variables with specified distributions and λ is related to the variance of these distributions. In this study, Gauss-Newton approximation is used to calculate Hessian of the objective function and Levenberg Marquardt algorithm is used to locate the minimum point as proposed by Dan Foresee and Hagan (1997). Training terminates when the numbers of epochs reach 200 and 150 for scaled conjugate gradient and Bayesian regulation respectively, or when validation performance does not improve any more. 3.2. Combining strategy The combining strategy is of great significance to reach robust and accurate ensemble predictions. The combining strategies considered in this study are simple average (SA), weighted average (WA), and NN meta-learner (meta-NN). The formula used for both SA and WA is:

yens =

N 1 wi f i ( x ) N

(9)

i=1

where N is number of learners, fi (x) is the response of learner i to input x, wi is the weight associated with learner i, wi = 1, and yens is the ensemble prediction. Weights are equal in SA wi = 1/N, while in WA are calculated as follows:

αi

w i = N

j=1

(10)

αj

where α i is the accuracy of learner i and is determined from the learner error (MSE) on validation set according to the following equation:

MSEi

αi = 1 − N

j=1

MSE j

(11)

Meta-NN combining strategy is based on the stacking approach (Wolpert, 1992). In this study, NN trained by Bayesian regulation algorithm (BR) is used as level-1 meta-learner which in turn aggregates level-0 learners predictions. The input to meta-NN is the predations of base learners and the output is the final ensemble prediction. A validation subset of the training set is used to train the meta-NN. Training the meta-learner using the same training set as base learners may result in overfitting. BR produces regularized meta-NN ensuring adequate performance and minimizing chances of overfitting. The maximum number of training epochs is set to 150.

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85

3.3. Ensemble optimization and pruning Once candidate NNs are generated in the “overproduce” phase. The selection of a subset of NN models and a combining strategy commences in the “choose” phase. This step, also referred to as pruning, aims to achieve better predictive performance with lower ensemble complexity. The pruning problem was proven to be an NP-complete problem, hence an exact solution is infeasible (Lu, Wu, Zhu, & Bongard, 2010; Tamon & Xiang, 20 0 0). Many pruning methods have been proposed and can be classified into three categories. Ordering-based methods which order N learners according to an evaluation function then put the front T (T < N) learners into the final ensemble. The most common orderingbased methods are reduce-error pruning which is greedy and timeconsuming and Kappa pruning which assumes all learners to have similar performance. This assumption does not hold here since we generate structurally diverse learners, consequently, some of them might have low predictive performance due to parameter perturbation. The second category includes clustering-based methods which employ a clustering algorithm to cluster the learners into groups and subsequently prune each group separately. Challenges for these methods include the choice of the clustering algorithm, the number of clusters, and the pruning process of each cluster (Tsoumakas, Partalas, & Vlahavas, 2009). The third category involves optimization-based methods where pruning is posed as a combinatorial optimization problem to find a subset of the original ensemble that optimizes a certain measure. Hill climbing methods are common in this category, however, they easily get stuck in local optima since solving the optimization problem globally is hard (Tsoumakas, Partalas, & Vlahavas, 2008). Therefore, metaheuristic algorithms such as genetic algorithm and simulated annealing have been proposed to solve the problem approximately (Soares et al., 2013; Tsoumakas et al., 2009). In this article, we opt for simulated annealing since it is easier to formulate and has been used successfully to solve many combinatorial optimization problems (Seyedkashi et al., 2012). The tendency of simulated annealing to be computationally more expensive than simpler greedy algorithms is justified by the high significance of pruning for diverse ensembles and the expected improvements in predictive performance, since the combining strategy is optimized along with the final ensemble subset. Simulated Annealing (Dowsland & Thompson, 2012) is a global metaheuristic optimization technique that has proven effective in solving combinatorial problems. It mimics the physical process of heating a material then slowly dropping the temperature to reduce defects, hence minimizing energy of the system. The algorithm generates new points using an annealing function, a probability distribution scaled by current temperature. The new point is accepted according to the following probability function:

Paccept (xi ) =

1

 f (x ) − f (x )  i i−1 max(T ) 1+e

(12)

where f(xi ) is the new objective value, f (xi−1 ) is the old objective value, and T is a vector of all previous optimal temperatures. Adaptive simulated annealing (ASA) additionally uses objective gradients to change the annealing parameter k, which in turn updates the temperature according to a temperature function R(T, k). ASA may reanneal, raise the temperature again leading to a wider search space, to ensure global optimization and avoid being trapped in local minima (Ingber, Petraglia, Petraglia, Machado et al., 2012). An annealing function for ASA combinatorial optimization is proposed here to effectively use ASA to optimize the ensemble model. The procedure for ensemble development with ASA optimization is shown in Algorithm 2.

77

Algorithm 2 Ensemble development using adaptive simulated annealing. Require: N, M, T0 , T f , RI, Maxiter , and SL 1: Generate N learners according to section 3.1 2: Generate initial random solution Sc of learners subset M and strategy C 3: i ← 1, Ti ← T0 and count ← 0 4: repeat Sn ← Sc 5: for j = 1 to Ti + 1 do 6: m1 ← random learner from candidates 7: m2 ← random learner from Sn 8: c ← random strategy from pool of strategies 9: Sn ( m2 ) ← m1 10: Sn (C ) ← c 11: end for 12: if f (Sn ) < f (Sc ) or rand[0, 1 ) < Paccept (Sn ) then 13: Sc ← Sn {Accept new solution} 14: 15: count ← count + 1 end if 16: if count > RI then 17: T ∗max(d ) 18: ki ← log 0 T ∗(d ) i i

19: 20: 21: 22: 23: 24:

i

count ← 0 else ki ← ki + 1 end if Ti ← T(i−1 ) 0.95Ki {Temperature Function} until Ti = T f or Maxiter or SL reached

ASA starts from initial temperature To and initial random solution (subset containing M learners out of the N generated candidates and initial combining strategy C). In step 4, the algorithm loops until the maximum number of iterations or final temperature Tf is reached. stalling limit (SL) also stops the optimization process if the objective function does not decrease for the specified number of iterations. T gradually decreases according to temperature function (Step 23) which is dependent on annealing parameter ki . In steps 17–22, ki value depends on gradients of the objective function in each dimension di . In steps 6–12, the proposed annealing function generates a new solution by looping Ti times and replacing one random learner m2 from the current solution Sn with one random learner m1 from the pool of candidates, and also replacing current combining strategy with another one(c) from candidate strategies available. The learners in the solution subset are ensured to be distinctive, i.e. no learner is repeated twice in the subset. In steps 13–16, Acceptance function Paccept evaluates whether to accept the new solution or not. If it is accepted, a counter (count) is incremented. If the counter reaches a defined value RI, reannealing interval, then reannealing occurs. Reannealing lowers annealing parameter ki , hence raising the temperature again. Reannealing property of ASA algorithm permits global search capability like multi-start optimization. As temperature T is lowered, the algorithm becomes more greedy and only accepts better solutions until reaching hopefully to the global minimum. the objective function in ASA is root mean squared error (RMSE) of the ensemble on validation set and is calculated as follows:

f (x ) =

1/m

m  



yi − C F ( xi )

2

(13)

i=1

where m is the number of samples in validation set, yi is the actual measurements, F = { f1 , f2 , .. fM } is the response of the chosen subset of M learners, while C is the chosen combining strategy.

78

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85

Table 3 Measurements range after normalization. Variable

Min

Max

Liquid flow rate Gas flow rate Downhole pressure Wellhead temperature Wellhead pressure Choke valve opening

−1.8384 −2.7163 −1.4466 −3.4674 −1.5191 −2.1820

2.0442 1.4735 2.1105 1.7450 1.6195 1.5528

4. Experimental results and discussion In this section, the proposed ensemble method is used to develop liquid and gas multiphase virtual flow meter using actual well test data collected from production wells with a common metering facility over the period of 16 months. The well test samples collected from this period are 238. The measurements chosen to develop the ensemble models are downhole pressure, wellhead temperature, wellhead pressure, and choke valve opening percentage. The models are required to estimate each well liquid and gas flow rates given their corresponding input measurements. All measurements are first normalized to have zero mean and one standard deviation before developing the models according to the following formula:

Ynormalized =

Y −μ

σ

(14)

where μ is the mean of measurements, and σ is the standard deviation. Table 3 shows the ranges of the normalized measurements used to develop the VFM. Two ensemble models were developed independently one for gas flow rate and another for liquid flow rate, each model was generated from the four wells data combined as required by the field operator since they are producing from the same reservoir. Using a single model to estimate individual flow rates from several wells, given their respective input measurements, has been the convention in data-driven VFMs since it leverages common patterns among these wells. However, the wells should be producing from the same reservoir or at least have close operating ranges (in terms of measurements) to achieve good predictive performance. The choice of generating two separate models, one for gas and another for liquid rather than one multivariate model, is justified by the fact that the relationships between the input attributes and the phase flow rates are quite different, hence the respective design and complexity to estimate each output is expected to be different. For example, the number of hidden neurons required to model gas flow patterns and predict gas flow rate is different from that of oil or water. Furthermore, the approach of univariate models should result in less complex models that are easier to maintain. Moreover, this approach permits the use of any type of base learners other than NNs, not all learning algorithms support multivariate outputs. On the other hand, generating models to predict gas and liquid flow rates separately may not fully capture the interactions between outputs when there exists a slip between gas and liquid velocities. However, in this article, we assume no slip condition and we rely on temperature and pressure states to indirectly capture any possible interactions. Furthermore, datadriven VFMs may experience less adverse effects due to slip than model-driven VFMs, particularity those that ignore slip, as long as there are enough input measurements from several points along the flowline. Nonetheless, an experimental study on a large flow loop, where slip can be controlled, is required to confirm to what extent slip conditions influence VFM predictions. In each ensemble model, 50 diverse NN learners were generated according to Section 3.1. The Optimal 20 of these learners

Fig. 4. Normalized performance of individual NN with different number of neurons. (a) Liquid flow model (b) Gas flow model.

were then combined using optimal combining strategy selected by ASA according to Section 3.3. ASA tries to find the best learners without compromising the performance and the best strategy to combine these learners. Performance indicators used are RMSE and mean absolute percent error (MAPE). RMSE and MAPE can be used to compare models, however, they don not reveal clearly the accuracy of the developed model for the flow metering application. Hence, cumulative deviation plot is used to indicate the accuracy of the developed VFM, as recommended by Norwegian society for oil and gas measurement (NFOGM) (Corneliussen et al., 2005) and American petroleum institute (API) MPMS Chapter 20.3 (MPMS, 2013). Cumulative deviation plot indicates the percentage of test points that fulfill certain deviation criteria. Other indicators used in this study are standard deviation of RMSE (SD) and maximum RMSE.



RMSE =

1/m

m  

yi − yˆi

2 (15)

i=1

MAP E = 1/m

m  yi − yˆi



i=1

yi



(16)

4.1. Individual neural network performance Evaluation of individual neural networks performance is essential before generating the ensemble models as it represents the baseline performance that any ensemble model should outperform. In this section, the number of neurons in a single NN is varied from 1 to 50 in steps of 2. The NN has one hidden layer with fast hyperbolic tangent transfer function. The output layer transfer function is set to linear, and Nguyen-Widrow method is used for weight initialization. The NN is trained using Levenberg Marquardt with early stopping. Fig. 4 (a) and (b) show the NNs normalized performance for liquid and gas flow models respectively with different number of neurons in the hidden layers. RMSE shown is

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85

79

Fig. 5. Diversity among generated NNs of a single run. (a) Number of neurons and hidden layers (b) λ and chosen training algorithm: 1-Levenberg Marquardt, 2-Scaled conjugate gradient, 3-Bayesian regularization.

the average of 10-fold cross-validation where each fold is used as testing set while the remaining 9 folds were divided to 80% training set and 20% validation set. At some point, both testing and validation errors start to increase as the number of neurons is increased implying overfitting and higher generalization errors. The minimum validation errors from this experiment are 0.10 and 0.094 for liquid and gas models respectively, which corresponds to 9 and 7 neurons in the hidden layers respectively. Even though a higher number of neurons was expected to capture the complexity of multiphase flow patterns, the generalization of the NNs deteriorated as the number of neurons increased beyond 10 neurons. This implies that a single NN learner may not be adequate to achieve good predictive performance even with increased NN complexity. This is probably due to the small numbe of dataset samples which is one of the motivations that encouraged the use of NN ensemble since bootstrapping the data resembles taking several snapshots from different angles making it easy for NNs to capture more patterns. In the coming sections, a substantial improvement in performance using NN ensembles is anticipated.

Fig. 6. Generalization error as learners are added to liquid flow ensemble model.

4.2. Ensemble diversity In this part, diversity among learners using the proposed generation method in Section 3.1 is presented. Fig. 5 shows properties of 50 NN candidates. Parameters are generated randomly within the specified range in Table 2. These parameters are the number of neurons and hidden layers in each NN, Chosen training algorithm, and value of λ. λ is set to zero for Levenberg Marquardt training algorithm since early stopping is employed, and is set to zero in Bayesian regularization as well since λ will be calculated automatically. Such generated diversity is expected to improve ensemble generalization as will be shown in coming experiments. 4.3. Performance of the combining strategies Fig. 7. Generalization error as learners are added to gas flow ensemble model.

This subsection presents the performance of the combining strategies presented in Section 3.2. 50 NNs are generated according to Section 3.1, then SA, WA and meta-NN are used to combine them. 10-fold cross-validation is used to generate 10 training/validation/testing folds. Fig. 6 and Fig. 7 show how the ensemble generalization error (on testing set of a single fold) is re-

duced as learners are added to the ensemble. Fig. 6 shows RMSE for liquid flow ensemble model while Fig. 7 is for gas flow ensemble model. It is observed that WA and meta-NN combining methods are better than SA. Meta-NN error fluctuates a lot due to re-

80

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85

Table 4 Average performance of liquid flow ensemble model using different combining strategies. Data Set

Combining Strategy SA

WA

meta-NN

Training

MAPE RMSE SD Max

5.73 0.0668 0.0032 0.0690

3.64 0.0508 0.0043 0.0587

4.26 0.0537 0.0074 0.0620

Testing

MAPE RMSE SD Max

6.95 0.0749 0.0019 0.0765

5.54 0.0730 0.0124 0.0818

5.30 0.0706 0.0137 0.0803

Table 5 Average performance of gas flow ensemble model using different combining strategies. Data Set

Fig. 8. Temperature decay and validation RMSE of the best and current solutions versus the number of iterations for liquid flow model. RI=150, Ti =90.

Combining Strategy SA

WA

meta-NN

Training

MAPE RMSE SD Max

2.62 0.0400 0.0 0 09 0.0412

2.09 0.0343 0.0 0 09 0.0358

1.88 0.0300 0.00176 0.0319

Testing

MAPE RMSE SD Max

3.16 0.0506 0.0020 0.0544

2.48 0.0465 0.0024 0.0496

2.59 0.0467 0.0039 0.0511

training of the level-1 neural network each time a learner is added since the input size of the network is increased. Table 4 and Table 5 report the average performance of liquid and gas ensemble models respectively on testing and training sets of 10 runs of 10-fold cross-validation. In this experiment, all 50 NNs are used to construct the ensemble. It is observed that WA and meta-NN outperform SA combining strategy. In liquid ensemble model, meta-NN performs better in testing sets than WA, while the opposite is true for gas ensemble model. However, meta-NN has the highest standard deviation (SD) which means results vary more every time the ensemble is constructed. This is due to the instability associated with NN meta-learner when initial weights and training data changes. It is generally observed that both weighted average and stacking approach (NN meta-learner) are competitive, and dataset complexity, and variability, as well as the correlation among base learners would decide the best combining strategy for a certain problem. In the next section, ASA optimization is introduced to select the best subset of learners and a combining strategy.

Fig. 9. Temperature decay and validation RMSE of the best and current solutions versus the number of iterations for gas flow model. RI=150, Ti =90.

Fig. 10. Temperature decay and validation RMSE of the best and current solutions versus the number of iterations for liquid flow model. RI=100, Ti =90.

4.4. Results of ensemble optimization After training 50 NN learners and examining how different combining strategies influence ensemble performance, it is time to apply the proposed ASA algorithm to choose the best 20 learners and combine them using the optimal combining strategy. In this experiment, the maximum number of iterations is set to 10 0 0. Initial temperature (Ti ) is set to 90. And stalling limit (SL) is set to 140 (learners∗ 7). Stalling limit stops the optimization process if the objective function does not decrease for the specified number of iterations (i.e. 140). The value has been chosen so that enough iterations are given for the temperature to decay until nearly zero, at which the optimization is too greedy to accept new solutions. The setting of stalling limit results in an efficient and fast optimization process. Another important parameter to set for ASA optimization is reannealing interval (RI). In order to seek the best RI value, several

optimization experiments with different RI values have been conducted. Table 6 shows the average validation performance with RI varying from 25 to 200 in steps of 25. When RI is 100, the validation error of liquid flow ensemble model is the lowest, while gas flow ensemble model has the lowest validation error at RI=150. Fig. 8 and Fig. 9 show validation errors versus the number of iterations when RI=150 for liquid and gas flow models respectively. Similarly, Fig. 10 and Fig. 11 show the validation errors when RI=100. Values in these figures represent a single fold of 10-fold cross-validation. It is observed that when RI=100, reannealing occurs several times, and in case of liquid flow model, it leads to a better minimum. Whereas gas flow model reached the best minimum when RI=150 since time was adequate for the optimization to find the

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85

81

Table 6 Results of ASA optimization at different RI values. Results are average performance on validation sets of 10-folds cross-validation. Model

Liquid flow

Gas flow

Reannealing Interval (RI)

MAPE RMSE SD Max MAPE RMSE SD Max

25

50

75

100

125

150

175

200

4.21 0.0560 0.0114 0.0761 2.13 0.0323 0.0104 0.0499

3.95 0.0539 0.0106 0.0720 2.13 0.0325 0.0105 0.0499

3.95 0.0539 0.0106 0.0719 2.06 0.0314 0.0110 0.0488

3.92 0.0539 0.0105 0.0718 2.00 0.0309 0.0113 0.0488

3.96 0.0539 0.0105 0.0718 1.91 0.0296 0.0112 0.0476

3.94 0.0539 0.0105 0.0721 1.78 0.0279 0.0109 0.0461

4.00 0.0539 0.0105 0.0718 1.78 0.0279 0.0109 0.0462

3.94 0.0538 0.0105 0.0718 1.79 0.0279 0.0108 0.0460

Table 8 Number of times each combining strategy was selected by ASA optimization in 100 runs. Model

Combining strategy

Liquid flow Gas flow

SA

WA

meta-NN

0 0

1 5

99 95

Table 9 Percentage of selecting the training algorithms and the number of hidden layers for the ensemble learners before and after ASA optimization. Paramter

Fig. 11. Temperature decay and validation RMSE of the best and current solutions versus the number of iterations for gas flow model. RI=100, Ti =90.

Training algorithm

Table 7 Average ensemble performance before and after ASA optimization on testing sets of 100 runs. Model

MAPE

RMSE

SD

Max

Liquid flow

Before ASA After ASA

5.30 4.73

0.0706 0.0585

0.0137 0.0046

0.0803 0.0617

Gas flow

Before ASA After ASA

2.48 2.35

0.0465 0.0442

0.0024 0.0036

0.0496 0.0509

best minimum without reannealing. Therefore, RI was chosen to be 100 and 150 for liquid and gas flow models respectively. After choosing the best optimization parameters, the optimization process was carried out on the 50 learners. The optimized ensemble contains 20 learners and the optimal combining strategy. Fig. 12 and Fig. 13 show the trend of VFM estimations versus the measured flow rates of the test samples in 6 folds. The VFM predictions seem to follow closely the actual gas and liquid flow rates. Table 7 compares the average performance before and after ASA optimization. The best results achieved before optimization have been selected for the comparison, meta-NN learner and WA for liquid and gas flow models respectively. Reported results represent the average of 10 runs of 10-fold cross-validation. It is observed that both complexity (number of learners) and accuracy of the models improved after ASA optimization. This shows that ASA optimization was successful in choosing the optimal subset of learners to be included in the ensemble. After 100 experimental runs of the optimization, the selected combining strategy by ASA optimization is analyzed. Table 8 shows the number of times each strategy was selected to combine the base learners. Meta-NN was chosen most of the times in both liquid and gas flow models followed by WA while SA was never chosen. It is also important to examine diversity before and after optimization. Table 9 shows the average percentage of each training

Number of hidden layers

Choice

Liquid flow model

Gas flow model

Before

After

Before

After

LM with early stopping Regularized-SCG Bayesian regulation

34.4%

39.5%

34.4%

34.5%

33.8% 31.8%

24.0% 36.5%

34.2% 31.4%

24.0% 41.5%

One Two

50.2%% 49.8%

45.5%% 54.5%

53.8%% 46.2%

58.5%% 41.5%

algorithm and the number of hidden layers composing the learners before and after ASA optimization. Values are the average of 100 runs. It can be seen that the training algorithms were used almost equally to train the original 50 NNs. However, after optimization, NNs trained by Levenberg Marquardt or Bayesian regulation were selected more. While NNs trained by regularized scaled conjugate gradient were less preferred to include in the final ensemble. This preference is in both liquid and gas flow models. On the other hand, NNs with one hidden layer were chosen more in the gas flow model, while NNs with two hidden layers were chosen more in the liquid flow model. This is due to the low liquid volume fraction in these production wells making it harder to predict liquid flow rates. Consequently, more complex learners are required to extract flow patterns. 4.5. Performance comparison to bagging and stacking In this section, the performance of the proposed ensemble method is compared to bagging and stacking. 50 NNs with 9 hidden neurons were generated for liquid flow model. And another 50 NNs with 7 hidden neurons were generated for gas flow model. The numbers of neurons are according to the results of the best individual neural networks presented in Section 4.1. All other parameters are the same as in Section 4.1. The generated 50 learners were trained using bootstrapping for bagging models according to Algorithm 1 and using cross-validation for the stacking approach. The developed learners are compared to the proposed method before and after optimization and the average results of 100 runs are reported in Table 10.

82

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85

Fig. 12. Optimized liquid flow model predictions versus actual measurements.

Fig. 13. Optimized gas flow model predictions versus actual measurements.

Table 10 Results of the proposed algorithm before ASA optimization (ensemble all using SA, WA, and meta-NN) and after ASA optimization, compared to bagging and stacking. The numbers in parenthesis indicate the number of learners in the ensemble. The results are average of 100 runs. Model

Method Bagging(50)

Liquid flow

Gas flow

MAPE RMSE SD Max MAPE RMSE SD Max

7.12 0.0909 0.0136 0.1004 3.91 0.0548 0.0031 0.0573

Stacking(50)

5.56 0.0751 0.0108 0.0872 3.35 0.0527 0.0036 0.0569

From Table 10, the developed ensemble models, both before and after ASA optimization, achieved superior performance when compared to conventional bagging and stacking ensemble techniques in terms of generalization capability, consistency, and complexity (number of base learner after ASA optimization). The diverse learners generated by the proposed method outperformed both bagging and stacking approaches even when combining the learners using SA. The proposed diversity seems to reduce generalization error, even though no pruning is performed. This can be observed when comparing bagging to ag-

Ensemble all(50)

ASA(20)

SA

WA

meta-NN

6.95 0.0749 0.0019 0.0765 3.16 0.0506 0.0020 0.0544

5.54 0.0730 0.0124 0.0818 2.48 0.0465 0.0024 0.0496

5.30 0.0706 0.0137 0.0803 2.59 0.0467 0.0039 0.0511

4.73 0.0585 0.0046 0.0617 2.35 0.0442 0.0036 0.0509

gregating the diverse learners using SA. The combining strategy, number of learners, and induction algorithm are the same, but the performance of proposed diversity generation outperforms standard bootstrapping. Similarly, standard stacking and ensemble-all using meta-NN can be compared since they both use the same number of learners and combining strategy (level1 learner). The performance of ensemble-all, where the proposed diversity is used to generate the learners, outperforms standard stacking where learners use the same topology and training algorithm.

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85

83

Table 11 Cumulative flow relative absolute errors at the final test sample. VFM model

Simple average (50 NNs) Weighted average (50 NNs) Meta-NN (50 NNs) ASA-optimized (20 NNs) Bagging (50 NNs) Stacking (50 NNs)

Fig. 14. Cumulative deviation for liquid flow VFM models.

Fig. 15. Cumulative deviation for gas flow VFM models.

5. VFM Cumulative deviation and cumulative flow After developing the VFM model and validating its performance using the test data. It is imperative to also check its accuracy using cumulative deviation and cumulative flow indicators. To plot cumulative deviation, the relative error percents between actual and VFM estimates are calculated for all samples in testing folds. Then, the number of samples with relative errors below certain deviation criteria (5%, 10%, 15%, etc.) are counted then divided by the total number of samples. The cumulative deviation percentage is plotted in y-axis against the deviation criteria in x-axis. Cumulative flow can be computed by summing the actual and estimated flow rate samples sequentially up to the final sample, then computing the relative error percentage at the final sample. Cumulative flow assists to identify if the VFM model has a systematic offset or drift. Fig. 14 and Fig. 15 present the resulted cumulative deviation plots for the liquid and gas flow models. It can be seen that ASAoptimized ensemble model has the lowest deviation, all the test samples are within 15% deviation for both gas and flow models. The proposed diverse ensemble without ASA optimization and using SA, WA, and meta-NN combining strategies came next with maximum deviation of 25% and 20% for liquid and gas flow models respectively. Bagging and stacking managed to estimate all the gas flow rates with maximum deviations of 25% and 30% respectively, however, only 97.5% of the points were estimated within 30% deviation for liquid flow rates. Furthermore, when looking at the percentage of samples within 10% deviation, the proposed ensemble methods have similar performance as bagging and stacking in gas flow models (around 95% of the samples), but they surpassed bagging and stacking in fluid flow models, 95%, 87%, and 81% of the

Relative error %% Liquid flow

Gas flow

0.44 0.32 −0.32 0.15 0.65 0.53

0.23 −0.18 0.19 0.08 0.39 −0.41

samples for the proposed method, stacking, and bagging respectively. The final performance indicator that should be looked into is cumulative flow performance. The actual and predicted flow rates of test samples are accumulated to investigate any possible drift or systematic bias. Table 11 presents the relative errors for the cumulative flow (at the last test sample) for all VFM models. All values are below 1% deviation indicating good flow rate tracking over time without any noticeable inclination towards a certain direction. ASA-optimized ensemble models have the smallest relative errors of 0.15% and 0.08% for liquid and gas flow respectively. All these indicators reveal the outstanding accuracy of the developed VFM system and indicate its ability to estimate multiphase flow rates accurately. This is expected to improve data reconciliation and validation and enhance reservoir recovery and production optimization. The presented work proved the excellent performance of ensemble models over individual models and marks a step forward towards increasing process efficiency and achieving completely integrated operations. It is expected that the proposed VFM can scale efficiently to bigger fields with many wells. Although the training time may increase as the number of wells increases, many oil fields have the wells grouped in clusters (or platforms) which enables deploying the VFM for each cluster. Besides, the update rate when the VFM is tuned is expected to be adequate since the required update rate is usually in minutes and sometimes hours. Furthermore, there are many aspects of the proposed VFM that can be optimized and parallelized such as learner generation and tuning owing to the parallel learner generation characteristic of the proposed ensemble. 6. Conclusion This article presented a methodology to develop a VFM expert system that is able to estimate phase flow rates accurately from limited well test data in fields with common metering infrastructure. Unlike currently proposed data-driven VFM systems, which are based on individual learners, the proposed VFM is based on ensemble learning and is able to learn from limited data via several diverse base learners. The proposed VFM managed to achieve excellent predictive performance, yet it is cheaper to develop and maintain when compared to current proposed modeldriven VFM systems. The proposed ensemble method establishes diversity among NN learners through training data bootstrapping, parameter and architectural manipulations, and altering learning trajectory using different training algorithms. Early stopping and L2 regularization are used to terminate the training process efficiently and avoid overfitting, while combinatorial ASA optimization is used to select a subset from the generated learners and a combining strategy to compose the final ensemble. The developed VFM has shown superior performance with average errors of 4.7% and 2.4% for liquid and gas flow rates respectively. The performance of the developed VFM was investigated further using cumulative deviation and cumulative flow plots as recommended by the international standards, NFOGM and API MPMS Chapter 20.3, and was

84

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85

also compared to standard bagging and stacking approaches. Results show substantial improvements in performance and generalization ability in the context of VFM application. While ensemble methods may incur additional computational costs, this is justified by the performance improvements achieved and the low sampling rate required in VFM applications. It is hoped that the proposed method enables rapid implementations of VFMs to provide continuous multiphase flow rate estimations, augment and backup physical MPFMs, improve data reconciliation and validation, and assist decision-making in production optimization and reservoir management. The presented work proved the applicability of data-driven expert systems and excellent performance of ensemble methods in oil and gas industry applications. The proposed system can be extended to other soft sensor applications in order to provide real-time and robust estimations of difficult to measure or expensive process variables utilizing available or cheaper measurements. Future work is towards deployment of the developed VFM in the field and incorporating auto-tuning strategy based on just-intime learning methodologies such as the moving window method. Moreover, several aspects that require additional investigation and future work include: investigating the impact of choosing a different pruning method, such as simpler heuristic or ordering-based and greedy algorithm, on the final ensemble performance. Using a multivariate model that exploits any possible interactions between outputs or simpler univariate models, which one would perform better?. Investigating the potential of a heterogeneous ensemble that combines different types of learners (different learning algorithms) and possibly some correlations or physical models. And finally, investigating the potential of the proposed ensemble method in other soft sensing and expert system applications and the possibility to outperform individual models.

Acknowledgment The authors acknowledge the financial support through Graduate Assistance (GA) scheme, data, and resources provided by Universiti Teknologi PETRONAS. References Ahmadi, M. A., Ebadi, M., Shokrollahi, A., & Majidi, S. M. J. (2013). Evolving artificial neural network and imperialist competitive algorithm for prediction oil flow rate of the reservoir. Applied Soft Computing, 13(2), 1085–1098. AL-Qutami, T. A., Ibrahim, R., Ismail, I., & Ishak, M. A. (2017). Radial basis function network to predict gas flow rate in multiphase flow. In Proceedings of the 9th international conference on machine learning and computing (pp. 141–146). ACM. Amin, A., et al. (2015). Evaluation of commercially available virtual flow meters (vfms). Offshore technology conference. Babelli, I. M. (2002). In search of an ideal multiphase flow meter for the oil industry. Arabian Journal for Science and Engineering, 27(2), 113–126. Bailey, R., Shirzadi, S., Ziegel, E., et al. (2013). Data mining and predictive analytics transforms data to barrels. Spe digital energy conference. Bello, O., Ade-Jacob, S., Yuan, K., et al. (2014). Development of hybrid intelligent system for virtual flow metering in production wells. Spe intelligent energy conference & exhibition. Society of Petroleum Engineers. Berneti, S. M., & Shahbazian, M. (2011). An imperialist competitive algorithm artificial neural network method to predict oil flow rate of the wells. International journal of computer applications, 26(10), 47–50. Biau, G. (2012). Analysis of a random forests model. Journal of Machine Learning Research, 13(Apr), 1063–1095. Bottou, L. (2012). Stochastic gradient descent tricks. In G. Montavon, G. B. Orr, & K.-R. Müller (Eds.), Neural Networks: Tricks of the Trade: Second Edition (pp. 421–436)). Berlin, Heidelberg: Springer Berlin Heidelberg. doi:10.1007/ 978- 3- 642- 35289- 8_25. Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123–140. Burden, F., & Winkler, D. (2009). Bayesian regularization of neural networks. Artificial Neural Networks: Methods and Applications, 23–42. Coelho, G. P., & Von Zuben, F. J. (2006). The influence of the pool of candidates on the performance of selection and combination techniques in ensembles. In The 2006 ieee international joint conference on neural network proceedings (pp. 5132–5139). IEEE.

Corneliussen, S., Couput, J.-P., Dahl, E., Dykesteen, E., Frøysa, K.-E., Malde, E., et al. (2005). Handbook of multiphase flow metering. Norwegian Society for Oil and Gas Measurement (NFOGM), Revision, 2. Dai, Q., & Han, X. (2016). An efficient ordering-based ensemble pruning algorithm via dynamic programming. Applied Intelligence, 44(4), 816–830. doi:10.1007/ s10489-015-0729-z. Dan Foresee, F., & Hagan, M. (1997). Gauss-newton approximation to bayesian learning. In International conference on neural networks: 3 (pp. 1930–1935). Dowsland, K. A., & Thompson, J. M. (2012). Simulated annealing. In Handbook of natural computing (pp. 1623–1655). Springer. Falcone, G., Hewitt, G., & Alimonti, C. (2009). Multiphase flow metering: principles and applications: 54. Elsevier. Giles, R. C. S. L. L. (2001). Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Advances in neural information processing systems 13: Proceedings of the 2000 conference: 13 (pp. 402–408). MIT Press. Gryzlov, A. N. (2011). Model-based estimation of multi-phase flows in horizontal wells. TU Delft, Delft University of Technology. Guan, D., Yuan, W., Lee, Y.-K., Najeebullah, K., & Rasel, M. K. (2014). A review of ensemble learning based feature selection. IETE Technical Review, 31(3), 190– 198. Hagan, M. T., Demuth, H. B., Beale, M. H., & De Jesús, O. (1996). Neural network design: 20. PWS publishing company Boston. Haldipur, P., Metcalf, G. D., et al. (2008). Virtual metering technology field experience examples. In Offshore technology conference. Offshore Technology Conference. Hasanvand, M., & Berneti, S. (2015). Predicting oil flow rate due to multiphase flow meter by using an artificial neural network. Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 37(8), 840–845. Hastie, T. J., Tibshirani, R. J., & Friedman, J. H. (2011). The elements of statistical learning: Data mining, inference, and prediction. Springer. Haykin, S. S., Haykin, S. S., Haykin, S. S., & Haykin, S. S. (2009). Neural networks and learning machines: 3. Pearson Upper Saddle River, NJ, USA:. Hunter, D., Yu, H., Pukish III, M. S., Kolbusz, J., & Wilamowski, B. M. (2012). Selection of proper neural network sizes and architecturesa comparative study. IEEE Transactions on Industrial Informatics, 8(2), 228–240. Ingber, L., Petraglia, A., Petraglia, M. R., Machado, M. A. S., et al. (2012). Adaptive simulated annealing. In Stochastic global optimization and its applications with fuzzy adaptive simulated annealing (pp. 33–62). Springer. Kadlec, P., Gabrys, B., & Strandt, S. (2009). Data-driven soft sensors in the process industry. Computers & Chemical Engineering, 33(4), 795–814. Krogh, A., & Vedelsby, J. (1995). Neural network ensembles, cross validation, and active learning. In Advances in neural information processing systems (pp. 231–238). Liu, Y., & Yao, X. (1999). Ensemble learning via negative correlation. Neural Networks, 12(10), 1399–1404. Liu, Z., Dai, Q., & Liu, N. (2014). Ensemble selection by grasp. Applied Intelligence, 41(1), 128–144. doi:10.1007/s10489- 013- 0510- 0. Lu, Z., Wu, X., Zhu, X., & Bongard, J. (2010). Ensemble pruning via individual contribution ordering. In Proceedings of the 16th acm sigkdd international conference on knowledge discovery and data mining (pp. 871–880). ACM. Ma, Y. (2012). Ensemble machine learning: Methods and applications. Springer New York. Moghaddasi, M., Lotfi, I., & Moghaddasi, M. (2015). Comparison of correlations for predicting oil flow rate passing through chokes. Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 37(12), 1340–1345. Møller, M. F. (1993). A scaled conjugate gradient algorithm for fast supervised learning. Neural networks, 6(4), 525–533. MPMS, A. (2013). 20.3: Measurement of multiphase flow. American Petroleum Institute. Nguyen, D., & Widrow, B. (1990). Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In Neural networks, 1990., 1990 ijcnn international joint conference on (pp. 21–26). IEEE. Ren, Y., Zhang, L., & Suganthan, P. (2016). Ensemble classification and regression-recent developments, applications and future directions [review article]. IEEE Computational Intelligence Magazine, 11(1), 41–53. Seyedkashi, S. H., Moslemi Naeini, H., Liaghat, G., Mosavi Mashadi, M., Shojaee G, K., Mirzaali, M., & Moon, Y. H. (2012). Experimental and numerical investigation of an adaptive simulated annealing technique in optimization of warm tube hydroforming. Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture, 226(11), 1869–1879. Shaban, H., & Tavoularis, S. (2014). Measurement of gas and liquid flow rates in two-phase pipe flows by the application of machine learning techniques to differential pressure signals. International Journal of Multiphase Flow, 67, 106–117. Soares, S., Antunes, C. H., & Araújo, R. (2013). Comparison of a genetic algorithm and simulated annealing for automatic neural network ensemble development. Neurocomputing, 121, 498–511. Stone, P., et al. (2007). Introducing predictive analytics: Opportunities. Digital energy conference and exhibition. Society of Petroleum Engineers. Tamon, C., & Xiang, J. (20 0 0). On the boosting pruning problem. Machine learning: ECML 2000, 404–412. Thorn, R., Johansen, G., & Hjertaker, B. (2012). Three-phase flow measurement in the petroleum industry. Measurement Science and Technology, 24(1), 012003. Tsoumakas, G., Partalas, I., & Vlahavas, I. (2008). A taxonomy and short review of ensemble selection. In Ecai 2008, workshop on supervised and unsupervised ensemble methods and their applications (pp. 41–46). Tsoumakas, G., Partalas, I., & Vlahavas, I. (2009). An ensemble pruning primer. Applications of supervised and unsupervised ensemble methods, 1–13.

T.A. AL-Qutami et al. / Expert Systems With Applications 93 (2018) 72–85 Varyan, R., Haug, R., Fonnes, D., et al. (2015). Investigation on the suitability of virtual flow metering system as an alternative to the conventional physical flow meter. Spe/iatmi asia pacific oil & gas conference and exhibition. Society of Petroleum Engineers. Varyan, R., et al. (2016). Cost saving of implementing virtual flow metering at various fields and engineering phases-a case study. In Offshore technology conference asia. Offshore Technology Conference. Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5(2), 241–259.

85

Xu, L., Zhou, W., Li, X., & Tang, S. (2011). Wet gas metering using a revised venturi meter and soft-computing approximation techniques. IEEE transactions on Instrumentation and Measurement, 60(3), 947–956. Zangl, G., Hermann, R., Schweiger, C., et al. (2014). Comparison of methods for stochastic multiphase flow rate estimation. Spe annual technical conference and exhibition. Society of Petroleum Engineers. Zhou, Z.-H. (2012). Ensemble methods: foundations and algorithms. CRC press.