Journal of Petroleum Science and Engineering 137 (2016) 87–96
Contents lists available at ScienceDirect
Journal of Petroleum Science and Engineering journal homepage: www.elsevier.com/locate/petrol
Integrating a robust model for predicting surfactant–polymer flooding performance Arash Kamari a, Farhad Gharagheizi a, Amin Shokrollahi b, Milad Arabloo b,n, Amir H. Mohammadi a,c,d,nn a Thermodynamics Research Unit, School of Engineering, University of KwaZulu-Natal, Howard College Campus, King George V Avenue, Durban 4041, South Africa b Young Researchers and Elites Club, North Tehran Branch, Islamic Azad University, Tehran, Iran c Institut de Recherche en Génie Chimique et Pétrolier (IRGCP), Paris Cedex, France d Département de Génie des Mines, de la Métallurgie et des Matériaux, Faculté des Sciences et de Génie, Université Laval, Québec (QC), G1V 0A6, Canada
art ic l e i nf o
a b s t r a c t
Article history: Received 27 July 2013 Received in revised form 31 August 2015 Accepted 28 October 2015 Available online 9 November 2015
The combination of surfactant and polymer in injecting water will improve the oil recovery during a water flood. The surfactant–polymer (SP) flooding would be more effective if economic policies are considered in addition to technical issues. The present communication introduces two reliable models for the performance evaluation of surfactant–polymer flooding in terms of both technical and economical approaches. To this end, a promising methodology called least square support vector machine (LSSVM) is applied for the accurate determining both recovery factor (RF) and net present value (NPV) related to SP flooding. The results obtained in this study reveal that there is an acceptable agreement between the data estimated by LSSVM approach and the actual data of RF and NPV. Moreover, to perform a comprehensive modelling to predict RF and NPV properly, an analysis is conducted on the different assignments of data for training, validation and testing phases. The results display that the value/percentage of data assigned for training set must be balanced and reasonable to avoid over-fitting problem, and also achieve an accurate and tested prediction. Finally, in order to show the importance degree of each input parameter on RF and NPV, a sensitivity analysis is conducted in this study. The results demonstrate the positive and negative impacts of those variables on both RF and NPV. & 2015 Elsevier B.V. All rights reserved.
Keywords: Recovery factor Net present value Enhanced oil recovery Least square support vector machine Data assignment
1. Introduction Scale deposition in pipelines as well as the robust emulsification related to the fluids produced are two important drawbacks faced with alkaline–surfactant–polymer (ASP) flooding technology (Hongyan et al., 2009; Zhang et al., 2007). In order to overcome such disadvantages, alkali-free surfactant–polymer flooding technique has been proposed (Hongyan et al., 2009). As a result, the chemical enhanced oil recovery (EOR) techniques, in particular surfactant–polymer (SP) flooding, has demonstrated its key role in decreasing the saturation of residual oil in both scales of experimentally tests/measurements and field developments. As a matter of fact, SP flooding decreases the mobility ratio and interfacial n
Corresponding author: Tel: +98 9171405706. Corresponding author at: Institut de Recherche en Génie Chimique et Pétrolier (IRGCP), Paris Cedex, France. E-mail addresses:
[email protected] (M. Arabloo),
[email protected] (A.H. Mohammadi). nn
http://dx.doi.org/10.1016/j.petrol.2015.10.034 0920-4105/& 2015 Elsevier B.V. All rights reserved.
tension (IFT) between water and oil phases. In other words, SP flooding method increases the production rates by injecting a surface/external agent (surfactant), regarding two important mechanisms including the IFT and mobility ratio which should be reduced (Lake, 1989). To forecast the results of EOR techniques, representative methods have been utilized already as an accurate and reliable tool (Crane et al., 1963; Giordano, 1987; Koval, 1963; Patton et al., 1971; Paul et al., 1984; Paul et al., 1982). Paul et al. (1982) developed the chemical flood predictive model to recognise best reservoirs for considering SP flooding. On the basis of fractional flow theory, the proposed method estimates the production rates and ultimate oil recovery efficiency vs. time. Lake et al. (1981) introduced an efficient method for the estimation of performance of a large-scale SP flooding project. As a result, their predictive method includes the sequential utilize of a finite difference tool for simulation target. Wang et al. (1981) proposed a simulator for evaluation of micellar/polymer flooding. Therefore, a streamline generator has been utilized to set up the flow and balance the concentration.
88
A. Kamari et al. / Journal of Petroleum Science and Engineering 137 (2016) 87–96
Nomenclature ANN SVM LSSVM SA CSA RBF IFT EOR RF NPV
artificial neural network support vector machine least square support vector machine simulated annealing coupled simulated annealing radial basis function interfacial tension enhanced oil recovery recovery factor net present value
As a result, it is well known that EOR methods have been strongly influenced by economic policies (Alvarado and Manrique, 2010), so that an economic analysis can improve their profitability (Kamari et al., 2014d). Therefore, it seems necessary to predict the net present value (NPV) obtained by using EOR methods. Regarding the risk analysis, Costa et al. (2008) acquired predictive method, and then combined it with economical analysis to increase the production prediction in a Brazilian reservoir. Wyatt et al. (2008) presented the economic effects of some important variables such as the injected chemical cost and recovery factor, etc., on the performance of ASP flooding method. Moreover, they compared the ASP flooding method with micellar flooding technique and alkaline–polymer flooding. They found that the rate of return and payout time is associated with the process rate, and/or pore volume per year of injection. Wu et al. (1996) focused on the economic conditions in the USA through study of a light-oil, onshore, and sandstone reservoir. To this end, they utilized the UTCHEM compositional simulator. They also validated the results obtained with field data for a SP flooding project. To evaluate the performance of an EOR method technically and economically, the typical evaluation technique for calculation of recovery factor (RF) is conventional simulator and to calculate the NPV is Excel spreadsheet or value and risk management software (Karambeigi et al., 2011). Generally, these aforementioned methods are not economic and take long time once uncertainty of various variables/parameters is included (Ghorbani, 2008; Qingjun et al., 2004; Silva et al., 2007). In last decades, the intelligent modelling of EOR methods has gained much interest due to their fast-prediction capability, reliability and applicability. Shafiei et al. (2013) developed a novel model for EOR screening targets on the basis of artificial neural networks (ANN) for evaluating the performance of steam flooding in naturally fractured heavy oil carbonate reservoirs. To evaluate the performance, they used recovery factor and cumulative steam oil ratio as outputs parameters. The results indicated that the developed model could be employed successfully for screening of steam flooding method in naturally fractured heavy oil carbonate reservoirs. In another study, Hou et al. (2009) established quantitative characterization models of oil increase and water-cut variation in polymer flooding method. To this end, they applied automatic solution technique on the basis of genetic algorithm (GA). Additionally, Hou et al. (2009) developed a quantitative model for performance prediction of polymer flooding method on the basis of numerical simulation of this EOR method. During simulation study of polymer flooding, impact of some efficient factors have been studied within the coupling the support vector machine (SVM) as well as orthogonal design approaches. The method of least square support vector machine (LSSVM) (Suykens and Vandewalle, 1999) is an improvement of the SVM methodology version. In this progressed version, a set of linear
SP ASP MSE R2 AARD RMSE SDE
γ
s2 r
surfactant–polymer alkaline–surfactant–polymer mean square error correlation coefficient average absolute relative deviation, % root mean square errors standard deviation errors relative weight of the summation of the regression errors squared bandwidth relevancy factor
equations has been applied using support vectors (SVs) instead of quadratic programming problems for simplifying the solutions associated with the SVM version. Thus far, the LSSVM version has been applied successfully for various applications in oil and gas disciplines (Arabloo et al., 2013; Esfahani et al., 2015; Farasat et al., 2013; Fazavi et al., 2014; Kamari et al., 2015a; Kamari et al., 2015c; Kamari et al., 2014a; Kamari et al., 2013; Kamari et al., 2014b; Kamari et al., 2015d; Rafiee-Taghanaki et al., 2013; Shokrollahi et al., 2013). Nevertheless, the LSSVM technique has not so far been implemented for forecasting the performance of EOR methods technically and economically, in particular NPV and RF related to the surfactant–polymer flooding. In the present study, two representative models have been proposed for the estimation of SP flooding performance (RF and NPV) for sandstone oil reservoir based on LSSVM modelling approach. The model was built and tested using a comprehensive data set collected from the literature. Moreover, to perform a comprehensive modelling to predict RF and NPV properly, an analysis is conducted on the different assignments of data for training, validation and testing phases. In addition, in order to show the importance degree of each input parameter on RF and NPV, a sensitivity analysis is conducted in this study. Finally, leverage technique has been presented simultaneously to discover the measured possible doubtful data of RF and NPV.
2. Investigational databank The applicability, reliability and accuracy of predictive/representative methods are normally associated with the comprehensiveness and validity of the databank utilized for their advancement (Gharagheizi et al., 2008; Mohammadi and Richon, 2008; Rafiee-Taghanaki et al., 2013; Scalabrin et al., 2006). Therefore, the most important parameters which affect RF and NPV should be selected. The data used in this study have been presented by Prasanphanich (2009) and gathered by Karambeigi et al. (2011) wherein SP flooding method had been modelled using UTCHEM simulator. Furthermore, NPV had been calculated utilizing Excel spreadsheet. The dataset includes surfactant slug size, surfactant concentration in surfactant slug, polymer concentration in surfactant slug, polymer drive size, polymer concentration in polymer drive, ratio of vertical permeability to horizontal permeability (Kv/Kh), and salinity of polymer drive as inputs parameters, and RF and NPV as outputs parameters. Here it is worthwhile to note that the importance of above variables for accurately prediction of RF and NPV has previously been confirmed by Karambeigi et al. (2011). To calculate the amount of NPV, Prasanphanich (2009) reported some economic assumptions as summarized in Table 1. Normally, the main problem of predictive economic
A. Kamari et al. / Journal of Petroleum Science and Engineering 137 (2016) 87–96
Table 1 Summary of the economic assumptions for calculating NPV (Prasanphanich, 2009). Economic input variable
Price
Initial capital costs Work over cost Development drilling cost Facilities and equipment Leasehold cost Intangible drilling cost
$0 $0 $500,000 $0 $0
Operating costs Waterflood operating cost ($/month) Chemical slug injection cost ($/bbl) Polymer drive injection cost ($/bbl) Produced water cost ($/bbl) Oil treatment cost ($/bbl) Overhead cost (% of direct operating cost)
$10,000 $0.10 $0.10 $0.10 $0.10 10%
Commodity prices Oil price ($/bbl) Alkali price ($/lb) Surfactant price ($/lb) Polymer price ($/lb)
$50.00 $0.00 $2.00 $1.00
Taxation Royalty Severance & Ad valorem tax rate ($ per STB) Effective income tax rate EOR tax credit rate
12.50% 0.046 38.25% 0.00%
General Inflation rate Oil price escalation Chemical price escalation Operating cost escalation Real discount rate Real reinvestment rate
3.00% 3.00% 3.00% 3.00% 10.00% 10.00%
models is the change of economic aspects over the years. To this end, an important factor called inflation rate is added to models with a determined percent. In this study, the inflation rate is determined on 3%. Furthermore, a 3% escalation is considered for the prices of oil, chemicals and operating costs. Additionally, the reservoir under-surveyed has an average porosity equals to 20.16%, a mean water saturation equals to 57.58%, a mean pay thickness equals to 147.52 ft, and an area equals to 80.21 acres. A detailed information about control variables used for the UTCHEM simulations and also characteristics of the reservoir simulated can be found elsewhere (Prasanphanich, 2009). Consequently, the maximum, minimum and average ranges of each input and output variables is summarized in Table 2.
89
computational strategy is a reliable way to analyze the data, solve the regression and classification problems, and recognize patterns (Eslamimanesh et al., 2012a; Suykens and Vandewalle, 1999). As a consequence, the SVM approach was initially introduced to solve the problems, in particular classification one, utilizing the hyper-planes for defining the decision borders between the actual data related to the various classes (Suykens and Vandewalle, 1999). Based on the initial equation of SVM algorithm, primary function f(x) is defined as below (Farasat et al., 2013; Shokrollahi et al., 2013; Suykens et al., 2002b):
f (x) = w T φ(x) + b
(1)
where w ,φ(x), x, and b are the transposed output layer vector, the feature map, a vector of dimension n, and the bias, respectively. To obtain w and b, the following equation has been presented as a cost function (Suykens et al., 2002b): T
N
1 Cost function= w T + c ∑ (ξk − ξk*) 2 k=1
(2)
To satisfy constraints:
⎧ y − w T φ( x ) − b ≤ ϵ + ξ , k = 1, 2, …, N k k k ⎪ ⎪ ⎨ w T φ( xk ) + b − y ≤ ϵ + ξk*, k = 1, 2, …, N k ⎪ ⎪ k = 1, 2, …, N ⎩ ξk, ξk* ≥ 0,
(3)
where xk and yk stand for kth input variable data, and kth output variable data, respectively. The ε stands for the established accuracy of the function approximation. The ξk and ξk* stand for slack variables. Here it is valuable to mention that if we consider a low value of ε for developing a precision model, some data points may be out-domain of the ε accuracy. Therefore, the applying the slack variables is needed for determining the permitted margin of inaccuracy. As a matter of fact, the c in Eq. (2) is recognized as the adjustable parameter of SVM algorithm which controls the error difference from the wanted ε. To reach a minimized cost function, utilization of the Lagrangian is needed (Hemmati-Sarapardeh et al., 2014; Suykens et al., 2002b): N
1 L a, a* = − 2
(
)
N
∑ ( ak − ak*)( al − al*)K ( xk , xl ) − ε ∑ ( ak − ak*) k, l = 1
k=1
N
+
∑ yk ( ak − ak*)
(4)
k=1
N
∑ ( ak − ak*) = 0, ak, ak* ∈ ⎡⎣ 0, c⎤⎦
3. Model development
k=1
(4a)
K ( xk , x1) = φ( xk )T φ( xl ), k = 1, 2, …, N
(4b)
3.1. The LSSVM strategy Regarding
the
machine-learning
community,
the
SVM
Table 2 The ranges of data utilized for the performance prediction of SP flooding method; data from Karambeigi et al. (2011) and Prasanphanich (2009). Parameter
Unit
Type
Min.
Avg.
Max.
Surfactant slug size Surfactant concentration Polymer concentration in surfactant slug Polymer drive size Polymer concentration in polymer drive Kv/Kh ratio Salinity of polymer drive RF NPV
PV Vol. fraction wt% PV wt%
Input Input Input Input Input Input Input Output Output
0.097 0.005 0.100 0.324 0.100 0.010 0.300 14.820 1.064
0.1772277 0.0177475 0.1766287 0.4817475 0.1481584 0.1285149 0.3485990 39.668020 4.4512248
0.259 0.030 0.250 0.648 0.200 0.250 0.400 56.990 8.101
Meq/ml % $ MM
90
A. Kamari et al. / Journal of Petroleum Science and Engineering 137 (2016) 87–96
where ak and ak*are identified as Lagrangin multipliers. In the end, the ultimate formula for the SVM algorithm is expressed as below (Hemmati-Sarapardeh et al., 2014):
2013; Farasat et al., 2013; Shokrollahi et al., 2013): n
MSE =
∑i = 1 (Z rep ./ predi − Z expi)2
N
f (x) =
∑
(ak − ak*)K (x, xk ) + b
k=1
(5)
For the determination of ak , ak*, and b, a quadratic programming problem should be solved. To this end, Suykens and Vandewalle (Pelckmans et al., 2002; Suykens and Vandewalle, 1999) technologically advanced and improved the SVM algorithm with the least-square modification (LSSVM). To introduce LSSVM algorithm, Suykens and Vandewalle (Pelckmans et al., 2002; Suykens and Vandewalle, 1999) reformulated the SVM as below (Arabloo et al., 2013; Hemmati-Sarapardeh et al., 2014; Rafiee-Taghanaki et al., 2013; Suykens et al., 2002b): N
Cost function =
1 T 1 w w + γ ∑ ek2 2 2 k=1
(6)
Subjected to the following constraint constraints (for k¼ 1,.,N):
yk = w T φ( xk ) + b + ek
(7)
where γ and ek are the adjustable parameter related to the LSSVM approach and the deviation variable, respectively. Following equation (Lagrangian) is expressed to solve the problem (Hemmati-Sarapardeh et al., 2014): N
L( w , b, e , a ) =
1 T 1 w w + γ ∑ ek2 2 2 k=1 N
−
∑ ak(wT φ( xk) + b + ek − yk ) k=1
(8)
As a result, for solving the problem, the derivatives of Eq. (8) should be considered equal to zero. To this end, below equations are expressed:
⎧ N ⎪ ∂L = 0 ⇒ w = ∑ a φ( x ) k k ⎪ ∂w k=1 ⎪ N ⎪ ∂L = ⇒ ak = 0 0 ∑ ⎪ ⎪ ∂w k=1 ⎨ ⎪ ∂L ⎪ = 0 ⇒ ak = γek , k = 1, 2, …, N ⎪ ∂ek ⎪ ⎪ ∂L = 0 ⇒ w T φ( x ) + b + e − y = 0, k = 1, 2, …, N k k k ⎪ ⎩ ∂ak
(9)
Eq. (9) indicates that there are 2N þ2 equations and 2N þ 2 unknown parameters (ak , ek , w, and b). Hence, the parameters of LSSVM are acquired by solving the system of equations presented in Eq. (9) (Suykens et al., 2002b). As mentioned earlier, γ is one of the adjustable parameter of LSSVM algorithm. Meanwhile, either of the LSSVM and SVM are kernel-based methods, we should consider the parameters of the kernel functions as other tuning parameters. The RBF kernel function is formulated as follows (Farasat et al., 2013; Fazavi et al., 2014; Tatar et al., 2013): 2⎞ ⎛ xk − x ⎟ K ( x, xk ) = exp⎜⎜ − ⎟ σ2 ⎝ ⎠
(10)
where s2 is recognized as adjustable parameter of the LSSVM algorithm. Consequently, s2 and γ are two adjustable parameters of LSSVM methodology with the RBF kernel function, which should be tuned with a reliable optimization technique (Shokrollahi et al., 2013; Suykens et al., 2002b). In the LSSVM approach developed in this study, the mean square error (MSE) has been used as follows (Arabloo et al.,
n
(11)
where Z is the RF or NPV, subscripts rep./pred. and exp express the values estimated by the LSSVM model developed in this study, and actual data of RF or NPV, respectively, and n stands for the number of samples from the initial population. In the present work, the original LSSVM approach introduced by Suykens and Vandewalle (1999) is employed for the determination RF and NPV through a performance evaluation analysis. 3.2. Normalizing the data As a consequence, small values of input parameters may be affected by higher values during training phase of model development by LSSVM approach. To overcome such problem, the available data points should be normalized to achieve a good prediction by LSSVM approach. Therefore, all available data points related to input/output variables are normalized as below:
X′ =
X−μ η
(12)
where X ′ denotes the initial value or actual data, X expresses the normalized values for the actual data, μ stands for the mean and finally η is the standard deviation. In this method, each point of input/output variables is normalized so that the mean and standard deviation of normal variable are 0 and 1, respectively (Karambeigi et al., 2011). Finally, it should be mentioned that normalizing the data has no impact on the results obtained because the all normalized data points will be returned to their original values finally. 3.3. Optimizing the model parameters To avoid local optima, simulated annealing (SA) methodology (Atiqullah and Rao, 1993; Fabian, 1997; Vasan and Raju, 2009) as an optimization strategy has previously been utilized. The first idea is based on allowing moves which lead in solutions of worse quality than the present solution for avoiding the problem of local optima. Coupled simulated annealing (CSA) optimization technique, as an improved version of SA, has been proposed to straight forwardly avoid the problem of local optima, and accordingly. The innovative principles of CSA methodology was reported by Suykens et al. (2001). For improving the gradient optimization method in order to avoid the problem of local optima in nonconvex complications, they applied the coupling among local optimization developments. Additionally, Xavier-de-Souza et al. (2010) used CSA as a reliable optimization technique for the improvement of accuracy related to the final solution of their problem. Further, the coupled optimization techniques like CSA could be more effective if the communication of a coupling strategy is decreased to minimum (Koch, 2005). The below formula defines the acceptance possibility of function A with coupling term ρ (Chamkalani et al., 2014):
⎛ − E (y ) ⎞ exp⎜ T aci ⎟ ⎝ k ⎠ Aθ (ρ , xi → yi ) = ⎛ −E(yi ) ⎞ exp⎜ T ac ⎟ + ρ ⎝ k ⎠
(13)
with Aθ (ρ , x i → yi ) the acceptance possibility for every x iΘ , yi ∈ γ and γγΘ . Therefore, γ expresses the set of all probable states and the set Θ = {x i}qi = 1 is indicated as the set of current states of q minimizers. Moreover, the variance s2 of A is as follows (Chamkalani et al., 2014):
A. Kamari et al. / Journal of Petroleum Science and Engineering 137 (2016) 87–96
1 1 ∑ AΘ2 − 2 q ∀ x ∈Θ q i
Consequently, the coupling term (Chamkalani et al., 2014):
ρ=
∑ xj ∈Θ
⎛ −E(y ) ⎞ i exp⎜ ⎟ ⎝ Tkac ⎠
91
(14)
ρ is presented as follows
(15)
3.4. Computational procedure To develop our models, seven input variables related to the SP flooding oil recovery have been selected while RF and NPV are two output variables, as mentioned already. Three sub-data sets, the “Training” set, “Validating” set and the “Test” set, have been considered on the main databank in order to develop and check the models constructed by LSSVM approach. Routinely, the “Training” set is employed for developing main structure of the model, and “Validating” set as well as the “Test (prediction)” set are utilized for visualizing the accuracy, capability, and reliability of the model obtained (Arabloo et al., 2013; Farasat et al., 2013; Shokrollahi et al., 2013). Here, it should be mentioned that the allocation process of the data is randomly. The K-means clustering technique (Kamari et al., 2014c) has been utilized in the present study for establishing relationship among different variables to find whether they have relevancy. The K-means clustering is recognized as a method which divides n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In other words, K-means clustering approach helps to assign and divide the datasets more properly. Moreover, the dataset may will have to be grouped by a different method before the K-means clustering is utilized. To this end, the K-Fold cross validation method is applied in this study. The approach considers k¼10, then it partitions the dataset into 10 subsets. Holding one subset as a test/validation set, the model will be trained with remaining 9 subsets all together as a training set. Afterward, the same procedure is repeated for 10 times to find best data division. In other words, the dataset is partitioned into subsets D1, D2, …, D10. In first, D1 is taken as test/validation set and {D2, D3, ., D10} as training set. In second, D2 is considered as test/validation set and {D1, D3, D4,., D10} as training. Finally, average of testing and validation accuracy is calculated over 10 runs. In this study, we selected four most accurate data divisions (i.e. 60:20:20, 70:15:15, 80:10:10, 90:5:5) to study and predict the performance of chemical flooding. The relevant results considering all subsets (e.i. training, validation, and testing) are illustrated in Figs. 1 and 2. Fig. 1 indicates that the results obtained for data divisions of 70:15:15, 80:10:10, and 90:5:5 are approximately same for the
Fig. 1. Influence of various data assignments on the prediction of RF using LSSVM approach.
Fig. 2. Influence of various data assignments on the prediction of NPV using LSSVM approach.
estimation of RF. Fig. 2 demonstrates the same results for the data divisions of 60:15:15, 80:10:10, 90:5:5. As a result, the value/percentage of assigned data for training set must be balanced and reasonable to avoid over-fitting problem, and also achieve an accurate and tested prediction. If the value/percentage of assigned data is high for training set, then the over-fitting problem may be occurred for the predictive model. Moreover, if the value/percentage of assigned data is low, then we could not develop a strong and reliable model for prediction targets. The results obtained in this study show that the model developed with the division of 80:10:10 is the most appropriate and reliable among the all because of its balanced accuracy in testing and validation phases. The other data divisions have accuracy only in one phase (training, testing, and or validation), which means the models developed using these data assignments have not been trained, tested, and or validated properly. Consequently, the best data assignment for the available dataset is 80:10:10 which is selected for the prediction of recovery factor and net present value of chemical flooding.
4. Results and discussion To evaluate the performance of SP flooding method with accurate determination of RF and the NPV, two predictive models are proposed using LSSVM approach. The adjustable parameters related to the LSSVM mathematical algorithm including should be assessed/optimized for achieving capable models. Therefore, γ and s2 parameters were adjusted using CSA optimization approach. For RF model, γ and s2 are reported 603.6993 and 8408207.8465, respectively. In addition, they are 34.6992 and 4863.5731 for NPV model, respectively. Fig. 3 indicates the flowchart of procedure of LSSVM design and development proposed in this study. Furthermore, Appendix A describes the step-by-step calculation procedure of LSSVM approach for the prediction of target variables. For visualizing the models developed in terms of accuracy, a comprehensive error analysis has been conducted in this study. To this end, statistical factors consisting of R-squared (R2) parameter, average absolute relative deviation (AARD) parameter, standard deviation (STD) parameter, and root mean square error (RMSE) parameter are used. Furthermore, crossplot and/or parity diagram as well as relative error distribution curve have been presented through a graphical error investigation. Table 3 summaries the results obtained for the determination of RF using LSSVM approach. For RF model in testing stage of modelling process, the calculated values of R2 and AARD are 0.988 and 1.9, respectively. The predicted values of RF by LSSVM model are graphically compared in Figs. 4 and 5. Fig. 4 is the scatter plot of the data estimated by LSSVM model against the actual literature data. As can be seen from the figure, there is low scatter around 45° line. This
92
A. Kamari et al. / Journal of Petroleum Science and Engineering 137 (2016) 87–96
Table 3 The results of statistical error analysis performed in this study for the developed RF model. Statistical parameter
Value
Training Set R2a Average absolute relative deviationb Standard deviation errorc Root mean square errord Ne
0.994 1.4 0.74 0.73 162
Validation set R2a Average absolute relative deviationb Standard deviation errorc Root mean square errord Ne
0.977 1.8 0.88 0.86 20
Test set R2a Average absolute relative deviationb Standard deviation errorc Root mean square errord Ne
0.988 1.9 0.93 0.94 20
Total R2a Average absolute relative deviationb Standard deviation errorc Root mean square errord Ne
0.993 1.5 0.77 0.77 202
∑N (X(i )exp − X(i ) rep ./ pred )2 i=1 . ∑N (X(i ) rep ./ pred − X )2 i X − X(i )exp 100 N (i )rep./ pred ∑i = X(i )exp N
a
R2 = 1 −
b
%AARD
c
STD =
d
RMSE =
e
Number of data points.
1 N
.
∑iN ((X(i)rep / pred ) − average(X(i)rep / pred ))2 . 1 N
∑iN (X(i)exp − X(i)rep ./ pred )2 .
respectively. From Table 4, and Figs. 6 and 7, it can be mentioned that the model proposed in this study for the determination of NPV data has a very good performance at training, testing and validation phases. In other words, acceptable results are established between the values determined by LSSVM approach and the actual literature data of RF and NPV. Moreover, the results obtained in this study clearly indicate that the CSA–LSSVM method is capable of efficient estimating the performance of other EOR
Fig. 3. Flowchart of procedure of LSSVM design and development proposed in the present work.
clearly displays that results obtained using LSSVM model for the determination of RF are acceptable. Moreover, Fig. 5 represents the relative error distribution curve for the RF data estimated using LSSVM model. The distribution plot demonstrates that variation of relative error around the zero line is low, so that it could be concluded that the estimated recovery factor for SP flooding method is reliable. Table 4 lists the results obtained by the LSSVM methodology in predicting the NPV data. Additionally, Figs. 6 and 7 represent crossplot and the relative error distribution plot for NPV variable,
Fig. 4. Graphical error analysis; scatter diagram related to the RF model.
A. Kamari et al. / Journal of Petroleum Science and Engineering 137 (2016) 87–96
Fig. 5. Graphical error analysis; relative error distribution plot related to the RF model.
93
Fig. 6. Graphical error analysis; scatter diagram related to the NPV model.
Table 4 The results of statistical error analysis performed in this study for the developed NPV model. Statistical parameter
Value
Training set R2 Average absolute relative deviation Standard deviation error Root mean square error N
0.994 2.1 0.11 0.11 162
Validation set R2 Average absolute relative deviation Standard deviation error Root mean square error N
0.992 2.7 0.15 0.16 20
Test set R2 Average absolute relative deviation Standard deviation error Root mean square error N
0.982 3.1 0.18 0.18 20
Total R2 Average absolute relative deviation Standard deviation error Root mean square error N
0.993 2.3 0.13 0.13 202
methods, in addition to SP flooding method. Here it is worth mentioning that the capability and practicality of any intelligent model increase with increasing the size of dataset, because the large dataset covers the wider range of variables than smaller one. Furthermore, the small datasets may have overfitting problem if the number of adjustable parameters of the model is high. The size of dataset for the development of LSSVM model cannot be a serious problem because the LSSVM approach has two adjustable parameters (γ and s2) only. Consequently overfitting problem is not occurred with this number of adjustable parameters. A more explanation regarding the impact of the number of adjustable parameters and size of dataset on the capability of the LSSVM approach is presented by Kamari et al., (2015b). However, despite the attractive mathematical benefits, the LSSVM approach has
Fig. 7. Graphical error analysis; relative error distribution plot related to the NPV model.
some potential disadvantages: (1) every data point of an existing database is contributing to the model developed and the relative importance of a data point is given by its support value; (2) the second problem is that it is well known that the utilization of a sum squared error cost function without regularization might lead to predictions which are less robust (Suykens et al., 2002a). To show the importance degree of each input parameter on both RF and NPV, a sensitivity analysis is conducted in this study. To this end, the relevancy factor (r) approach (Chen et al., 2014) is employed for evaluating the effect of input variables on both RF and NPV. As a result, r value with directionality causes a more obvious and intuitive understanding about the overall impact, and was utilized in this study. In this approach, the positive or negative influence of input variables on the recovery factor or net present value is however not determined by absolute value of r. The r values are calculated as follows (Hosseinzadeh and HemmatiSarapardeh, 2014): n
r (Inpk , μg ) =
∑i = 1 (Inpk, i − Inpk )(μi − μ ) n
n
∑i = 1 (Inpk, i − Inpk )2∑i = 1 (μi − μ )2
(16)
94
A. Kamari et al. / Journal of Petroleum Science and Engineering 137 (2016) 87–96
Fig. 8. Results of sensitivity analysis conducted in this study.
where Inpk,i stands for ith value of the kth input variables and Inpk denotes the average value of the kth input variables, μi indicates the ith value of the recovery factor or net present value predicted by the developed LSSVM by Eq. (1), and μ is the average value of the recovery factor or net present value predicted by the developed LSSVM. Fig. 8 shows the results of sensitivity analysis conducted in this study using relevancy factor approach. As clear from the figure, the importance degree (r) of input variables for prediction of RF and NPV has been well-demonstrated graphically. For instance, surfactant concentration has a negative effect on NPV and positive impact on RF. This clearly shows that high value of surfactant concentration could increase RF, and decrease NPV because surfactant is an expensive material. It should be mentioned that a comprehensive economic investigation is required to detailed analyse the impact of different variables on NPV. Normally, the finding outlier data point (s) within construction of a representative is of much importance so that it has a fundamental role in determining the individual datum (or groups of data) which may differ from the bulk of databank (Mohammadi et al., 2012a; Mohammadi et al., 2012b; Rousseeuw and Leroy, 2005). Consequently, it is valuable to assess the RF and NPV data estimated by LSSVM models proposed in this study. Therefore, the Leverage tactic (Goodall, 1993; Gramatica, 2007) has been used in this study for detecting the outlier data related to RF and NPV databank. The mathematical description of Leverage tactic as well as some definitions are presented in detail in Appendix B (Mohammadi et al., 2012a, 2012b). Figs. 9 and 10 illustrate the Williams plots obtained using Leverage tactic for the results acquired by implementation of LSSVM model for RF and NPV,
Fig. 10. The sketched Williams plot illustrating the identification of outlier data point of NPV.
respectively. The existence of the majority of data points in the ranges 0rHr0.13366 and –3rRr3 for RF and NPV respectively confirms that both models proposed in this study for the determination of RF and NPV data are in acceptable situation statistically. The outlier detection analysis demonstrates that only 4 data points are identified as outlier implementing RF model (Fig. 9). Additionally, it is illustrated that only 1 data point is outlier for the development of LSSVM model in order to estimate NPV data (Fig. 10).
5. Conclusion The use of a least square support vector machines methodology tuned by co-application of a coupled simulated annealing optimization tactic was pursued in this study to determine recovery factor and net present value during surfactant–polymer flooding as a performance evaluation in the sandstone reservoir. To this end, the literature-reported data points were utilized to present two reliable models for the determination of recovery factor and net present value during surfactant–polymer flooding, as mentioned earlier. Both models developed in this study were successfully employed to evaluate the performance of SP flooding method, and for prediction of both technical and economic factors including RF and NPV. Moreover, the results obtained in this study reveal that the models proposed could be used for the simulation of surfactant–polymer flooding and even other EOR processes. The results of data assignment analysis display that the value/percentage of data assigned for training set must be balanced and reasonable to avoid over-fitting problem, and also achieve an accurate and tested prediction. Furthermore, in order to show the importance degree of each input parameter on RF and NPV, a sensitivity analysis is conducted in this study. The results demonstrate the positive and negative impacts of those variables on both RF and NPV. Finally, the Leverage mathematical strategy was utilized to identify the outlier data points existing in the databank. It was illustrated that 4 data points for the recovery factor model, and only 1 data point for the development of LSSVM model in order to estimate NPV data, are recognized as outlier.
Appendix A. LSSVM procedure for calculation of the target variables Fig. 9. The sketched Williams plot illustrating the identification of outlier data points of RF.
For utilizing the models developed in this study, a mathematical algorithm is coded to determine the NPV and RF data. In the
A. Kamari et al. / Journal of Petroleum Science and Engineering 137 (2016) 87–96
95
Table A.1 The sample set for calculation of NPV and RF. Surfactant slug size surfactant concentration polymer concentration in surfactant slug
polymer drive size
polymer concentration in polymer drive
Kv/Kh ratio
Salinity of polymer drive
0.259
0.648
0.1
0.25
0.3
0.0175
0.25
beginning, the original LSSVM toolbox for MATLAB should be installed, after that the directory of the LSSVM toolbox should be inserted as the main directory in the MATLAB environment. Following example provides a step-by-step instruction for using the proposed models. Example: calculation of NPV and RF using the data summarized in Table A1. Then, NPV and RF are calculated easily applying the below codes in the command window: clc;clear; Data¼ [0.259 0.0175 0.25 0.648 0.1 0.25 0.3]; %Input vector %% Prediction of LS-SVM model based on polynomial kernel function %% Calculation of NPV load ’NPV.mat’ NPV_clac ¼ simlssvm({trainX,trainY,type,gam, sig2,’RBF_kernel’,’preprocess’},{alpha,b},Data) % Calculated NPV %% Calculation of RF load ’RF.mat’ RF_clac ¼ simlssvm({trainX,trainY,type,gam, sig2,’RBF_kernel’,’preprocess’},{alpha,b},Data) % Calculated NPV
Appendix B. Leverage methodology As a result, the Leverage technique is composed of statistical analysis comprising residual errors and the Hat matrix that consider the actual values of data and estimated values of RF and NPV (Eslamimanesh et al., 2012b; Mohammadi et al., 2012a, 2012b). As a matter of fact, using a proper mathematical model is main application criterion of Leverage algorithm. The Hat matrix embedded in the Leverage technique is presented as follows (Eslamimanesh et al., 2012b; Gharagheizi et al., 2012; Goodall, 1993; Gramatica, 2007; Mohammadi et al., 2012a; Rousseeuw and Leroy, 2005):
H = X (X t X )−1X t
(A.1)
where t stands for the transpose matrix and X refers to a matrix containing k columns and N rows. The Hat values, which represent the viable region of the case under study, are characterized by the diagonal elements of the H matrix. Moreover, the outliers are normally detected on the basis of H value obtained from Eq. (A.1). The H indices and standard residual values are well described in the Williams plot. In general, a warning leverage (H*) is set to be 3p/n, where p is equal to the number of model coefficients plus one and the number of training data points is symbolized with n. If the leverage is 3, it means the data points are accepted with standard deviation of 7 3 with respect to the average (mean) value. If H [0, H*] and R [ 3, 3] are the intervals in which the main part of the data are placed, exhibiting that the proposed technique statistically works well in the defined domain in terms of predictive performance. It is important to note that acceptable high leverage is attributed to the condition where H is equal to or greater than H* and R is between 3 and 3. The data points in the intervals of R o 3 or 3 oR are recognized as the suspected data, known as poor high leverage. Presence of the outliers in
computation and analysis may cause considerable error in the model output, leading to false decisions.
Appendix C. Supplementary material Supplementary data associated with this article can be found in the online version at http://dx.doi.org/10.1016/j.petrol.2015.10.034.
References Alvarado, V., Manrique, E., 2010. Enhanced oil recovery: an update review. Energies 3 (9), 1529–1575. Arabloo, M., Shokrollahi, A., Gharagheizi, F., Mohammadi, A.H., 2013. Toward a predictive model for estimating dew point pressure in gas condensate systems. Fuel Process. Technol. 116, 317–324. Atiqullah, M.M., Rao, S., 1993. Reliability optimization of communication networks using simulated annealing. Microelectron. Reliab. 33 (9), 1303–1319. Chamkalani, A., Chamkalani, R., Mohammadi, A.H., 2014. Hybrid of two heuristic optimizations with LSSVM to predict refractive index as asphaltene stability identifier. J. Dispers. Sci. Technol. 35, 1041–1050. Chen, G., et al., 2014. The genetic algorithm based back propagation neural network for MMP prediction in CO2-EOR process. Fuel 126, 202–212. Costa, A., Schiozer, D., Moczydlower, P., Bedrikovetsky, P., 2008. Use of representative models to improve the decision making process of chemical flooding in a mature field. Paper SPE 115442 presented at the 2008 SPE Russian Oil & Gas Technical Conference and Exhibition, Moscow, Russia, 28–30 October 2008. Crane, F., Kendall, H., Gardner, G., 1963. Some experiments on the flow of miscible fluids of unequal density through porous media. Old SPE J. 3 (4), 277–280. Esfahani, S., Baselizadeh, S., Hemmati-Sarapardeh, A., 2015. On determination of natural gas density: least square support vector machine modeling approach. J. Nat. Gas Sci. Eng. 22, 348–358. Eslamimanesh, A., et al., 2012a. Phase equilibrium modeling of clathrate hydrates of methane, carbon dioxide, nitrogen, and hydrogen þ water soluble organic promoters using Support Vector Machine algorithm. Fluid Phase Equilibria 316, 34–45. Eslamimanesh, A., Gharagheizi, F., Mohammadi, A.H., Richon, D., 2012b. A statistical method for evaluation of the experimental phase equilibrium data of simple clathrate hydrates. Chem. Eng. Sci. 80, 402–408. Fabian, V., 1997. Simulated annealing simulated. Comput. Math. Appl. 33 (1), 81–94. Farasat, A., Shokrollahi, A., Arabloo, M., Gharagheizi, F., Mohammadi, A.H., 2013. Toward an intelligent approach for determination of saturation pressure of crude oil. Fuel Process. Technol. 115, 201–214. Fazavi, M., Hosseini, S.M., Arabloo, M., Shokrollahi, A., Amani, M., 2014. Applying a smart technique for accurate determination of flowing oil/water pressure gradient in horizontal pipelines. J. Dispers. Sci. Technol. 35, 882–888. Gharagheizi, F., Alamdari, R.F., Angaji, M.T., 2008. A new neural network group contribution method for estimation of flash point temperature of pure components. Energy Fuels 22 (3), 1628–1635. Gharagheizi, F., et al., 2012. Evaluation of thermal conductivity of gases at atmospheric pressure through a corresponding states method. Ind. Eng. Chem. Res. 51 (9), 3844–3849. Ghorbani, D., 2008. Development of methodology for optimization and design of chemical flooding. ProQuest. Giordano, R., 1987. Estimating field-scale micellar/polymer performance. In: Proceedings of the SPE Annual Technical Conference and Exhibition. Goodall, C.R., 1993. 13 Computation using the QR decomposition. Handb. Stat. 9, 467–508. Gramatica, P., 2007. Principles of QSAR models validation: internal and external. QSAR Comb. Sci. 26 (5), 694–701. Hemmati-Sarapardeh, A., et al., 2014. Reservoir oil viscosity determination using a rigorous approach. Fuel 116, 39–48. Hongyan, W., Xulong, C., Jichao, Z., Aimei, Z., 2009. Development and application of dilute surfactant–polymer flooding system for Shengli oilfield. J. Pet. Sci. Eng. 65 (1), 45–50. Hosseinzadeh, M., Hemmati-Sarapardeh, A., 2014. Toward a predictive model for estimating viscosity of ternary mixtures containing ionic liquids. J. Mol. Liq. 200, 340–348. Hou, J., Li, Z.-q, Cao, X.-l, Song, X.-w, 2009. Integrating genetic algorithm and support vector machine for polymer flooding production performance prediction. J. Pet. Sci. Eng. 68 (1), 29–39.
96
A. Kamari et al. / Journal of Petroleum Science and Engineering 137 (2016) 87–96
Kamari, A., Arabloo, M., Shokrollahi, A., Gharagheizi, F., Mohammadi, A.H., 2015a. Rapid method to estimate the minimum miscibility pressure (MMP) in live reservoir oil systems during CO2 flooding. Fuel 153, 310–319. Kamari, A., Bahadori, A., Mohammadi, A.H., 2015b. On the determination of crude oil salt content: application of robust modeling approaches. J. Taiwan Inst. Chem. Eng. Kamari, A., Bahadori, A., Mohammadi, A.H., Zendehboudi, S., 2015c. New tools predict monoethylene glycol injection rate for natural gas hydrate inhibition. J. Loss Prev. Process Ind. 33, 222–231. Kamari, A., Gharagheizi, F., Bahadori, A., Mohammadi, A.H., 2014a. Determination of the equilibrated calcium carbonate (calcite) scaling in aqueous phase using a reliable approach. J. Taiwan Inst. Chem. Eng. 45, 1307–1313. Kamari, A., Hemmati-Sarapardeh, A., Mirabbasi, S.-M., Nikookar, M., Mohammadi, A.H., 2013. Prediction of sour gas compressibility factor using an intelligent approach. Fuel Process. Technol. 116, 209–216. Kamari, A., Mohammadi, A., Bahadori, A., Zendehboudi, S., 2014b. A reliable model for estimating the wax deposition rate during crude oil production and processing. Pet. Sci. Technol. 32 (23), 2837–2844. Kamari, A., Mohammadi, A.H., Bahadori, A., Zendehboudi, S., 2014c. Prediction of air specific heat ratios at elevated pressures using a novel modeling approach. Chem. Eng. Technol. 37 (12), 2047–2055. Kamari, A., Nikookar, M., Sahranavard, L., Mohammadi, A.H., 2014d. Efficient screening of enhanced oil recovery methods and predictive economic analysis. Neural Comput. Appl. 25, 815–824. Kamari, A., Safirii, A., Mohammadi, A.H., 2015d. Compositional model for estimating asphaltene precipitation conditions in live reservoir oil systems. J. Dispers. Sci. Technol. 36, 301–309. Karambeigi, M., Zabihi, R., Hekmat, Z., 2011. Neuro-simulation modeling of chemical flooding. J. Pet. Sci. Eng. 78 (2), 208–219. Koch, G., 2005. Discovering Multi-core: Extending the Benefits of Moore's law. Technology, p. 1. Koval, E., 1963. A method for predicting the performance of unstable miscible displacement in heterogeneous media. Old SPE J. 3 (2), 145–154. Lake, L., Johnston, J., Stegemeier, G., 1981. Simulation and performance prediction of a large-scale surfactant/polymer project. Old SPE J. 21 (6), 731–739. Lake, L.W., 1989. Enhanced Oil Recovery. Mohammadi, A.H., Eslamimanesh, A., Gharagheizi, F., Richon, D., 2012a. A novel method for evaluation of asphaltene precipitation titration data. Chem. Eng. Sci. 78, 181–185. Mohammadi, A.H., Gharagheizi, F., Eslamimanesh, A., Richon, D., 2012b. Evaluation of experimental data for wax and diamondoids solubility in gaseous systems. Chem. Eng. Sci. 81, 1–7. Mohammadi, A.H., Richon, D., 2008. A mathematical model based on artificial neural network technique for estimating liquid water hydrate equilibrium of water hydrocarbon system. Ind. Eng. Chem. Res. 47 (14), 4966–4970. Patton, J., Coats, K., Colegrove, G., 1971. Prediction of polymer flood performance. Old SPE J. 11 (1), 72–84. Paul, G., Lake, L., Gould, T., 1984. A simplified predictive model for CO2 miscible flooding. In: Proceedings of the SPE Annual Technical Conference and Exhibition. Paul, G., Lake, L., Pope, G., Young, G., 1982. A simplified predictive model for micellar-polymer flooding. SPE California Regional Meeting. Pelckmans, K. et al., 2002. LS-SVMlab: a Maltab/c toolbox for least squares support vector machines. Tutorial. KULeuven-ESAT, Leuven.
Prasanphanich, J., 2009. Gas reserves estimation by Monte Carlo simulation and chemical flooding optimization using experimental design and response surface methodology (master's thesis). The University of Texas at Austin. Qingjun, Y. et al., 2004. Integrating soft computing and hard computing for production performance prediction of low permeability reservoir. In: Proceedings of the SPE Asia Pacific Conference on Integrated Modelling for Asset Management. Rafiee-Taghanaki, S., Arabloo, M., Chamkalani, A., Amani, M., Zargari, M.H., Adelzadeh, M.R., 2013. Implementation of SVM framework to estimate PVT properties of reservoir oil. Fluid Phase Equilibria 346, 25–32. Rousseeuw, P.J., Leroy, A.M., 2005. Robust Regression and Outlier Detection. Wiley. com, New York, p. 589. Scalabrin, G., Marchi, P., Bettio, L., Richon, D., 2006. Enhancement of the extended corresponding states techniques for thermodynamic modeling. II. Mixtures. Int. J. Refrig. 29 (7), 1195–1207. Shafiei, A., Dusseault, M.B., Zendehboudi, S., Chatzis, I., 2013. A new screening tool for evaluation of steamflooding performance in naturally fractured carbonate reservoirs. Fuel 108, 502–514. Shokrollahi, A., Arabloo, M., Gharagheizi, F., Mohammadi, A.H., 2013. Intelligent model for prediction of CO2 – reservoir oil minimum miscibility pressure. Fuel 112, 375–384. Silva, P.C., Maschio, C., Schiozer, D.J., 2007. Use of Neuro-Simulation techniques as proxies to reservoir simulator: application in production history matching. J. Pet. Sci. Eng. 57 (3), 273–280. Suykens, J.A., De Brabanter, J., Lukas, L., Vandewalle, J., 2002a. Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing 48 (1), 85–105. Suykens, J.A., Vandewalle, J., 1999. Least squares support vector machine classifiers. Neural Process. Lett. 9 (3), 293–300. Suykens, J.A., Vandewalle, J., De Moor, B., 2001. Intelligence and cooperative search by coupled local minimizers. Int. J. Bifurc. Chaos 11 (08), 2133–2144. Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J., 2002b. Least Squares Support Vector Machines. World Scientific Publishing Company, Singapore. Tatar, A., Shokrollahi, A., Mesbah, M., Rashid, S., Arabloo, M., Bahadori, A., 2013. Implementing Radial Basis Function Networks for modeling CO2-reservoir oil minimum miscibility pressure. Nat. Gas Sci. Eng. 15, 82–89. Vasan, A., Raju, K.S., 2009. Comparative analysis of simulated annealing, simulated quenching and genetic algorithms for optimal reservoir operation. Appl. Soft Comput. 9 (1), 274–281. Wang, B., Lake, L., Pope, G., 1981. Development and application of a streamline micellar/polymer simulator. In: Proceedings of the SPE Annual Technical Conference and Exhibition. Wu, W., Vaskas, A., Delshad, M., Pope, G., Sepehrnoori, K., 1996. Design and optimization of low-cost chemical flooding. In: Proceedings of the SPE/DOE Improved Oil Recovery Symposium. Wyatt, K., Pitts, M., Surkalo, H., 2008. Economics of field proven chemical flooding technologies. In: Proceedings of the SPE/DOE Symposium on Improved Oil Recovery. Xavier-de-Souza, S., Suykens, J.A., Vandewalle, J., Bollé, D., 2010. Coupled simulated annealing. Syst. Man Cybern. Part B: Cybern. IEEE Trans. 40 (2), 320–335. Zhang, L.-H, Xiao, H., Zhang, H.-T, Xu, L.-J, Zhang, D., 2007. Optimal design of a novel oil–water separator for raw oil produced from ASP flooding. J. Pet. Sci. Eng. 59 (3), 213–218.