Improving the prediction of petroleum reservoir characterization with a stacked generalization ensemble model of support vector machines

Improving the prediction of petroleum reservoir characterization with a stacked generalization ensemble model of support vector machines

Applied Soft Computing 26 (2015) 483–496 Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/...

2MB Sizes 0 Downloads 20 Views

Applied Soft Computing 26 (2015) 483–496

Contents lists available at ScienceDirect

Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc

Improving the prediction of petroleum reservoir characterization with a stacked generalization ensemble model of support vector machines Fatai Anifowose a,∗ , Jane Labadin a , Abdulazeez Abdulraheem b a b

Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia Department of Petroleum Engineering, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia

a r t i c l e

i n f o

Article history: Received 21 March 2013 Received in revised form 12 September 2014 Accepted 14 October 2014 Available online 23 October 2014 Keywords: Stacked generalization ensemble Support vector machines Regularization parameter Porosity Permeability

a b s t r a c t The ensemble learning paradigm has proved to be relevant to solving most challenging industrial problems. Despite its successful application especially in the Bioinformatics, the petroleum industry has not benefited enough from the promises of this machine learning technology. The petroleum industry, with its persistent quest for high-performance predictive models, is in great need of this new learning methodology. A marginal improvement in the prediction indices of petroleum reservoir properties could have huge positive impact on the success of exploration, drilling and the overall reservoir management portfolio. Support vector machines (SVM) is one of the promising machine learning tools that have performed excellently well in most prediction problems. However, its performance is a function of the prudent choice of its tuning parameters most especially the regularization parameter, C. Reports have shown that this parameter has significant impact on the performance of SVM. Understandably, no specific value has been recommended for it. This paper proposes a stacked generalization ensemble model of SVM that incorporates different expert opinions on the optimal values of this parameter in the prediction of porosity and permeability of petroleum reservoirs using datasets from diverse geological formations. The performance of the proposed SVM ensemble was compared to that of conventional SVM technique, another SVM implemented with the bagging method, and Random Forest technique. The results showed that the proposed ensemble model, in most cases, outperformed the others with the highest correlation coefficient, and the lowest mean and absolute errors. The study indicated that there is a great potential for ensemble learning in petroleum reservoir characterization to improve the accuracy of reservoir properties predictions for more successful explorations and increased production of petroleum resources. The results also confirmed that ensemble models perform better than the conventional SVM implementation. © 2014 Elsevier B.V. All rights reserved.

1. Introduction The ensemble learning paradigm is the most recent Computational Intelligence (CI) tool for combining a “mixture of experts”. It has proved to be relevant in solving most challenging industrial problems. Its superior performance over the conventional method of learning individual techniques has been confirmed when applied on classification and regression problems. Its superior performance over the conventional method of learning individual techniques has been confirmed when applied on classification and regression problems. The ensemble learning paradigm is an advancement in the supervised machine learning technology. While the latter searches for the best hypothesis among all possible hypotheses that describe the solution to a problem, the former combines the best hypothesis of different

∗ Corresponding author. Present address: The Research Institute, Center for Petroleum & Minerals, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia. Tel.: +966 138604383; mobile: +966 532649740. E-mail addresses: [email protected] (F. Anifowose), ljane@fit.unimas.my (J. Labadin), [email protected] (A. Abdulraheem). http://dx.doi.org/10.1016/j.asoc.2014.10.017 1568-4946/© 2014 Elsevier B.V. All rights reserved.

instances of the base learner and its associated hypotheses. The ensemble learning paradigm has gained much ground with classification problems in many fields. However, it is still a new technology whose great benefit is still waiting to be tapped in the petroleum industry. The ensemble learning methodology is a close emulation of the human socio-cultural behavior of seeking several people’s opinions before making any important decision [1]. With the reports of the successful application of ensemble modeling over their individual base learners in other areas [2–6], the petroleum industry is in dire need of this new modeling approach in the petroleum reservoir characterization business. A lot of data is being generated and acquired in the petroleum industry due to the proliferation of various sensor-based logging tools such as Wireline, Open-Hole, Logging-While-Drilling, Measurement-While-Drilling, and seismic measurements of increasing dimensions. Due to the high dimensionality that may be involved in the data acquired through these systems, the ensemble methodology is most ideal for extracting useful knowledge out of them without compromising expert opinions and model performance. For those outside the facilities that may not have access to these voluminous data, the ensemble methodology is still the ideal technique to manage the little data that may be available to them. The ensemble learning methodology is ideal for handling both cases of too much data and too little data [7]. Ensemble models have the capability to combine different architectures of their

484

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

base models, diverse data sampling methodologies, different recommended best input features, and various optimized parameters obtained from different experts in the optimization of estimates and predictions of petroleum reservoirs properties. CI techniques have been well applied in the petroleum industry over the years, especially in reservoir characterization, but with a pace that does not match up with the rate of advancement and the dynamics of the technology. Interestingly, researchers in the petroleum industry have moved progressively from the use of empirical correlations and linear regression models to the use of CI and machine learning techniques [8]. However, the application of CI in the petroleum industry have mainly been limited to Artificial Neural Networks (ANN) and Fuzzy Logic [9–12] with very few work in the area of hybrid CI modeling [13–18] but almost nothing yet in the application of ensemble models. Reservoir characterization, an essential process in the petroleum industry for estimating various properties of petroleum reservoirs, needs the ensemble learning methodology to improve the accuracy of predictions that are important for the qualitative and quantitative evaluation of petroleum reserves to further increase the consequent success of exploration, drilling and production activities. A marginal increase in the prediction accuracy of these petroleum properties is capable of improving the efficiency of exploration, drilling and production of petroleum resources with less time and effort. In this study, we propose an ensemble model of support vector machines (SVM) based on the diversity exhibited by the regularization parameter. The regularization parameter is an important parameter to tune and optimize the SVM model during the training process. The major motivation for this study is the continued quest for better predictions of reservoir properties and the various reports, in other fields of application, of superior performance of ensemble techniques over their individual base learners. This paper is aimed at achieving the following objectives: • To review the application of ensemble techniques, especially in petroleum reservoir characterization. • To establish a premise for the typical need of ensemble techniques in petroleum engineering. • To investigate the applicability of the stacked generalization ensemble model of Support vector machines based on different expert opinions on the regularization parameter. • To investigate the possible outperformance of the proposed SVM ensemble model over the conventional bagging method and the Random Forest technique using representative reservoir porosity and permeability datasets. • To demonstrate the superiority of the bagging-based SVM ensemble model over the Random Forest technique. • To confirm whether the performance of the ensemble technique is better than or as good as the best from among its individual base models. This is the first study to address the applicability of the ensemble learning paradigm in the prediction of petroleum reservoir properties especially porosity and permeability. By demonstrating the superior performance of the proposed regularization parameter-driven SVM ensemble learning model over that of the conventional bagging method, this study is expected to boost the interest of petroleum engineering researchers in this new learning paradigm as it promises to open windows of possibilities in its application in petroleum reservoir characterization. The rest of this paper is organized as follows: Section 2 presents a rigorous review of literature on the ensemble methodology, its application in reservoir characterization, overview of the SVM technique, and the effect of the regularization parameter on its performance. Section 3 discusses the architecture of the three ensemble models implemented in this study. Section 4 describes the methodology employed in this study from data description through the evaluation criteria to the details of the implementation of the ensemble models. Results are presented and discussed in Section 5 while conclusions on the results are drawn in Section 6.

2. Literature survey 2.1. Ensemble learning methodology The ensemble learning methodology combines multiple “expert opinions” to solve a particular problem [19]. Each opinion is represented in each instance of the base learners that make up the ensemble model. In regression tasks, each instance attempts to search for the best hypothesis that solves the problem. In ensemble learning, the best hypotheses identified by the base learners are combined using any of the various combination rules to evolve the best overall solution offered by the ensemble model. This methodology of combining the opinions of different “experts” to obtain an overall “ensemble” decision is deeply rooted in human culture such as in the classical age of ancient Chinese and Greece and formalized during the Enlightenment with the Condorcet Jury Theorem that proved that the judgment of a committee is superior to those

of individuals, provided the individuals have reasonable competence [1]. This also explains why most human activities are usually implemented using the committee system. The ensemble methodology was originally applied on classification and clustering problems which include bio-informatics [20], object detections [21,22], gene expressions [1], protein synthesis [3] and later extended and applied to time series prediction problems [23,24]. With the ensemble methodology, the selection of the overall best hypothesis helps to improve the performance of a model and reduces the likelihood of an unfortunate selection of a poor model. This resembles the way humans solve problems. Having a committee of experts reduces the risk of taking a wrong decision on a problem. The ensemble methodology makes the selection of such candidate models (representing the best hypotheses) more confident, less risky and unbiased. A generalized flowchart for ensemble techniques is shown in Fig. 1. The ensemble methodology starts with the conventional identification of training and testing data subsets. A finite number of base learners is then established. Each base learner is then constructed with the desired diversity such as using different input features, random sampling of the input data or different values of a tuning parameter. The individual results produced by the base learners are then combined using a relevant combination algorithm to evolve a single ensemble solution. As a methodology for classification and clustering problems, it was successfully implemented in the Adaptive Boosting (AdaBoost) technique. It was later extended for regression problems in the form of Bootstrap Aggregate method, abbreviated as bagging, and implemented in the Random Forest technique [25]. Bagging involves training each of the ensemble base learners on a subset that is randomly drawn from the training data with replacement while giving each data sample equal weight [26]. The major motivation for the ensemble learning paradigm is the statistically sound argument that the paradigm is part of human daily lives: We ask the opinions of several experts before making a decision: we seek the opinions of several doctors before accepting a medical procedure; we read user reviews before choosing a web hosting service provider; we evaluate reports of references before taking new employees; manuscripts are reviewed by experts before accepting or rejecting them; etc. In each case, the primary objective is to minimize the error associated with the final decision that will be made by combining the individual decisions [7].

2.2. Ensemble learning in reservoir characterization Petroleum reservoir characterization is the process of estimating and predicting various reservoir properties for use in full-scale reservoir models for the determination of the quality and quantity of a petroleum reservoir. Some of the reservoir properties that are of interest to Petroleum Engineers include porosity, permeability, water saturation, pressure, volume, temperature, oil and gas ratio, bubble point pressure, dew point pressure, well-bore stability, diagenesis and lithofacies. Out of these, porosity and permeability are the most important as they jointly serve as key indicators of reservoir quality. The accuracy of almost all other properties depends on the accuracy of these two properties. Porosity is the percentage of pores in core samples that are usually extracted from a petroleum reservoir during a coring process. The process involves using specialized devices to take cylindrical samples of rocks at intervals of about one foot for laboratory measurements. The higher the percentage of pores in a rock sample, the more will be its ability to hold hydrocarbons, water and gas. Permeability is a measure of how the individual pores in the core samples are interconnected. No direct relationship has been universally established between these two properties especially in

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

485

model development offered by the ensemble learning technique has not been adequately utilized.

Start

2.3. Related work on ensemble learning

* Training Sets * Tesng Sets * N Base Learners

* Use training dataset to train base learner, n * Test the training learner with test dataset

Produce

Hn

No

n = N?

Best Hn

End

Most of the applications of ensemble methodology are found in classification and clustering tasks. Kim et al. [31] showed that even though SVM has been reported to provide good generalization performance, often the classification results are far from the theoretical expectations. Based on this premise, they proposed an ensemble model of SVM and tested it on IRIS classification, handwritten recognition and fraud detection datasets. The reported results showed that the proposed ensemble SVM outperformed the single SVM in terms of classification accuracy. Another ensemble model of SVM was proposed by Chen et al. [21] to detect the occurrence of road accidents. They reported that the ensemble model outperformed the single SVM model. Sun and Li [6] reported a significantly superior performance of an SVM ensemble over an individual SVM model in the prediction of financial distress. In the bio-informatics, Peng [32] and later Chen and Zhao [33] presented an ensemble of SVM and ANN classifiers respectively for the classification of microarray gene data. They both reported that their respective ensemble techniques performed better than using the single SVM and ANN techniques. Nanni and Lumini [3] proposed an ensemble of SVM classifiers for the prediction of bacterial virulent proteins using features that were extracted directly from the amino acid sequence of a given protein rather than those from the evolutionary information of a given protein as it is usually done in the literature. Despite their deviation from the well known feature source, they showed that the ensemble model performed better than a single SVM model used on the conventional feature extraction method. Similarly, positive conclusions were given by Valentini et al. [34] and Caragea et al. [20] about their SVM ensemble models for cancer recognition and Glycosylation site prediction respectively. Other interesting areas of application of ensembles include hydrogeology [5], time series forecasting [23], customer churn prediction [35], control systems [36], Soil Science [37], detection of concept drift [38], and short-term load forecasting [39]. Other base learners used apart from ANN and SVM include Neuro-Fuzzy Inference System [40], Bayesian Inference [41], Fuzzy Inference Systems [42], Decision Trees [43], and Extreme Learning Machines [44]. Despite the ample reports of the successful application of the ensemble learning paradigm in literature, the benefits offered by the ensemble learning paradigm has not been harnessed in the prediction of porosity, permeability, and other petroleum reservoir properties. This paper is expected to serve as a motivating factor for its appreciation, acceptance and continued application of this new methodology in the petroleum industry.

Fig. 1. A generalized flowchart for ensemble techniques.

2.4. Support vector machines and the regularization parameter carbonate geological formations. Hence, if a rock sample is very porous, it may not necessarily be of high permeability [27]. Before the CI technology was embraced in the petroleum industry, these properties used to be calculated from various empirical correlations and then followed by the use of linear regression techniques. Presently, the concept of Hybrid Computational Intelligence (HCI) is gaining more popularity in the petroleum engineering domain. As the quest for increased performance of predictive models in reservoir characterization continues to soar, the ensemble methodology offers a great potential for developing better performing and more robust predictive models. Despite the reasonable number of successful applications of CI and HCI techniques in reservoir characterization [8,28–30], the great opportunities of robust

SVM is a set of related supervised machine learning methods used for classification and regression. It belongs to the family of Generalized Linear Classifiers. It can also be considered as a special case of Tikhonov Regularization as it maps input vectors to a higher dimensional space where a maximal separating hyperplane is constructed [45]. A conceptual framework of how SVM works is shown in Fig. 2. Input datasets, especially those belonging to the non-separable case, are mapped to a higher dimensional hyperplane where classification and regression becomes easier. The hyperplane is then optimized to evolve a solution to a problem. The generalization capability of SVM is ensured by special properties of the optimal hyperplane that maximizes the distance to training examples in the high dimensional feature space.

486

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

Structural risk minimization

Optimal hyperplane

High dimensional mapping Maximal Predicon Plane

Trainin Dataset

Raw Data Plane

Fig. 2. Mapping data to a high dimensional plane in SVM.

SVM was initially introduced for the purpose of classification when Vapnik et al. [46] developed a new ␧-sensitive loss function technique that is based on statistical learning theory, and which adheres to the principle of structural risk minimization, seeking to minimize an upper bound of the generalization error. The new technique that emerged out of this modification is known as Support Vector Regression (SVR). It depends only on a subset of the training data, because the cost function for building the model ignores any training data close to the model prediction (within a threshold ␧). It has been shown to exhibit excellent performance in prediction tasks. Some of the kernel functions used in SVM are: Linear: k(x, x ) = x ∗ x

(1)

RBF: k(x, x ) = e−param∗x−x  

(2)

Polynomial: k(x, x ) = (x ∗ x + 1)

param

(3)

where param is the kernel parameter in the SVM feature space [47]. More details about the theoretical basis of SVM can be found in Burges [45,47] while cases of successful applications can be found in Abe [13,48,49]. Among the SVM parameters that must be set appropriately for optimal performance is the regularization parameter, C. It is also known in literature as the penalty factor [50] and the penalty parameter [51]. The regularization parameter is one of the important tuning parameters of SVM whose value could have a great effect on the SVM model performance. It needs to be chosen diligently to avoid the problems of overfitting and underfitting. It helps to maintain a balance between training error and model complexity. It is so sensitive to the performance of the model that its value needs to be chosen carefully. Kecman [51] considered it as one of the two most important learning parameters that can be utilized in constructing SVMs for regression tasks. A lot has been written about its sensitivity and importance but no fixed value has been known to be universally recommended for it. It turns out, just like the other machine learning-based techniques, to be controlled by the nature of the available data. Alpaydin [50], while proving that the regularization parameter is critical to the performance of the SVM model, recommended that a “proper” value is chosen for it. Due to the difficulty in doing this, he only cautioned that choosing a number that is too large may result in a high penalty for non-separable points which will lead to overfitting. Conversely, choosing a number that is too small may result in the model not having enough strength to generalize on new cases

which will lead to underfitting. Joachims [52] posited that “a small value for C will increase the number of training errors, while a large C will lead to a behavior similar to that of a hard-margin SVM” but did not give any recommendation. Shawe-Taylor and Cristianini [53] suggested that the value of this parameter should be varied “through a wide range of values”, again without giving any specific recommended value. In order to avoid any wrong assumption or giving a false recommendation, Kecman [51] and Cherkassky and Mullier [54] simply advised users to “carefully select the best value”. But “how do users carefully select the best value for this parameter?” remains an unanswered question. The aforementioned arguments are the various and diverse expert opinions on the regularization parameter, C, ranging from being a critical factor to being an important parameter whose value needs to be chosen carefully [50,51]. Though, these expert propositions are enough to establish the existence of the required diversity on the regularization parameter, we still went ahead to present an experimental proof of the diversity (to be discussed in Section 4.3). Given the importance of this parameter to the performance of SVM, the most reasonable way to optimize it is to incorporate all the expert views in an ensemble model and let the model itself combine the results of the propositions to proffer the best solution to the problem. This will serve the purpose of being focused on solution rather than the techniques. The major justifications for proposing this novel ensemble algorithm include the diversity in the opinions of experts on the best value to assign to this parameter and the effect of this parameter on the performance of SVM (as further discussed in Section 4.3). Diversity is a major requirement for ensemble learning [55,56]. 3. Architecture of the SVM ensemble models 3.1. Our proposed SVM ensemble model For the proposed ensemble model, we implemented a stacked generalization [7] of the SVM technique. The reason for choosing this architecture is simply that it has not been applied to any petroleum engineering problem. This is a case of an existing method with a new application. A conceptual framework of the proposed algorithm is shown in Fig. 3. Given the perceived diversity in the effect of the regularization parameter on the performance of SVM, we set up 10 instances of SVM as base learners viz. Model 1, Model 2, . . ., Model 10. Each learner uses a different value of C viz. C1 , C2 , . . ., C10 and the commonly used values for the other tuning parameters. The ensemble model was first trained using bootstrapped samples of the training data combined with a value of Cn . This created the Tier 1 models, whose outputs were then used to train a Tier

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

487

Input Data Bootstrap + Diverse C

Trained SVM

Tier-1 Models

Tier-2 (Meta) Model

Trained SVM

...

Trained SVM

Meta-Model SVM with

Final Decision Fig. 3. Conceptual framework of our proposed ensemble model.

2 model (meta-model). In essence, the Tier-2 model combined the outputs of the Tier-1 models. Hence, there was no need for a separate combination algorithm as applies in the conventional bagging method. The underlying idea of this architecture was to ensure that the Tier-1 models were properly learned from the examples in the training data. This is justified by a scenario in which a particular model incorrectly learned a certain region of the input data feature space, and hence consistently failed to correctly predict the instances coming from that data region. Then the Tier-2 model will be able to learn this behavior. Along with the learned behaviors of the other models, the Tier-2 model can correct such improper training process. The proposed ensemble technique worked according to the following procedure: Algorithm 1. 1. Set up the SVM model with the other fixed parameters (epsilon, lambda, kernel and step size). 2. Do for n = 1 to 10//there are 10 instances of C to create the Tier-1 models 3. Set Cn to an assigned value 4. Randomly divide data into training and testing 5. Use the training data to train the SVM model, Sn , using Cn 6. Use the test data to predict the target (porosity and permeability) 7. Keep the output of the above as Hypothesis, Hn 8. Continue 9. All hypotheses, Hn where n = 1, 2, . . ., 10 become input to Tier-2 model 10. Tier-2 model is trained with the combined hypotheses. 11. Tier-2 model outputs the final decision.

high-variance noise using a moving average filter that averages each sample of the data over all available samples. The noise component will be averaged out while the information content of the entire data is unaffected by the averaging operation [58]. When the prediction errors made on the data samples are averaged out, the error of the overall output is reduced. Prediction errors are composed of two controllable components: the accuracy of the model (bias); and the precision of the model when trained on different training sets (variance). Therefore, since averaging has a smoothing (variance-reducing) effect, the goal of the bagging-based ensemble systems is to create several classifiers with relatively fixed (or similar) bias and then use the averaging combination rule on the individual outputs to reduce the variance [58]. This is the statistical justification for the bagging method. The SVM ensemble with the bagging methodology was implemented using the following procedure: Algorithm 2. 1. Start SVM with all parameters set as optimal as possible. 2. Set N to the number of desired iterations. 3. Set T to the desired percentage of data for bootstrapped training data. 5. Do for n = 1 to N 6. Randomly extract T% of the data for training 7. Use the training data to train the SVM model, Sn 8. Use the remaining (100 − T)% test data to predict the target variables 9. Keep the result of the above as Hypothesis, Hn 11. Continue 12. Compute the average of all Hypotheses, j (x) Hfinal(x) = arg max n using the Mean( ) rule:

3.2. Conventional ensemble method j (x) =

The conventional ensemble method for regression tasks is the Bagging [25] and implemented in the Random Forest technique [57], an ensemble of Classification And Regression Trees (CART). Its counterpart for classification is called Boosting and was implemented in Adaboost technique. In the bagging method, the contribution of each base learner in the ensemble model is given an equal weight. To improve model variance, bagging trains each model in the ensemble using a subset that was randomly drawn from the training set with replacement. The results from the base learners are then averaged over all the base learners to obtain the overall result of the ensemble model. The main concept of using the bagging method to increase the prediction accuracy of ensembles is similar to reducing a

1 n

n 

Hn (x)

1

3.3. The Random Forest technique Random Forest is an ensemble learning-based technique that consists of a bagging of un-pruned Decision Tree learners [25] with a randomized selection of input data samples and predictors. The algorithm [59] is based on the bagging technique developed by Breiman [25] and the randomized feature selection by Ho [60,61]. Random Forest begins with building a Tree and then grows more Trees using a bootstrap subsample of the data until the minimum node is reached in order to avoid overfitting that comes with larger

488

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

Table 1 Predictor variables for site 1 well logs for porosity.

The R-Square is a statistical measure of how strong a relationship is between n pairs of two variables, x and y. It is expressed as:

Predictor variables for porosity Core 1 2 Top Interval 3 Grain density 4 Grain volume Length 5 Diameter 6

n

   

xy −

x

y

       2    2

R-Square =

n

x2 −

x

n

y2 −

(4)

y

The RMSE is a measure of the spread of the actual x values around the average of the predicted y values. It is expressed as:



n (x i=1 i

RMSE = number of Trees. More details about Decision Trees can be found in Sherrod [62] and application cases can be found in Park et al. [63] and Leibovici et al. [64]. Random Forest has been shown to be effective and accurate [65] but with reports of possible overfitting [66–68] hence liable to perform poorly in prediction tasks. The algorithm of Random Forest is presented [59] as follows: Algorithm 3. 1. Starting with a tree: a. Set N = number of training cases. b. Set M = number of features. c. Select a subset m of input variables such that m  M. d. Do for n = 1 to N i. Train this tree with a bootstrap sample of the training data. ii. Use the rest of the cases to estimate the prediction error of the tree. iii. Replace the bootstrap sample. Continue e. Calculate the best split based on these m variables in the training set. 2. The above procedure is iterated over all trees in the ensemble. 3. Calculate the mean of the performance of all trees. This represents the performance of the Random Forest technique.

4. Research methodology 4.1. Description of data For the design, testing and validation of our proposed ensemble model, three porosity datasets and three permeability datasets were used. The porosity datasets were obtained from a petroleum reservoir in the Northern Marion Platform of North America (Site 1) while the permeability datasets were obtained from a reservoir in the Middle East (Site 2). The datasets from Site 1 have six predictor variables for porosity, while the datasets from Site 2 have eight predictor variables for permeability. These are shown in Tables 1 and 2.

− yi )2

The MAE is the average of the absolute errors of the predicted y values relative to the actual x values. It is given by: 1 |xi − yi | n n

MAE =

Table 2 Predictor variables for site 2 well logs for permeability. Predictor variables for permeability

Full meaning

1 2 3 4 5 6 7 8

Gamma ray log Porosity log Density log Water saturation Deep resistivity Micro-spherically focused log Neutron porosity log Caliper log

(6)

i=1

The R-Square, RMSE and MAE were used to evaluate the performance of the proposed ensemble model and the conventional SVM technique. For the conventional bagging ensemble model, we used the Mean(R-Square), Mean(RMSE) and Mean(MAE) to obtain the overall performance (presented in Algorithm 2). Random Forest has its Mean( ) rule for its models combination already embedded in the algorithm (presented in Algorithm 3). Our comparative analysis of the three ensemble models was based on these sets of evaluation criteria. 4.3. Diversity measures The major requirement for the implementation of ensemble learning paradigm is the existence of diversity in the system [7,56,58]. Due to the importance of diversity [69,70], we used a number of measures to ensure that our proposed ensemble is valid. Most of the diversity measures that were proposed for ensemble learning in literature were mainly for classification [71–73]. Since our work is on regression, we considered those proposed by Dutta [74]: correlation coefficient, covariance, chi-square, disagreement measure, and mutual information entropy. However, we observed that the first three measures are related to each other. Also disagreement measure is exclusively for classification ensembles similar to that proposed by Kuncheva and Whitaker [75]. Hence, in order to avoid redundancy, we selected diversity correlation coefficient (DCC) and mutual information entropy (MIE). The DCC is the degree of closeness between any two of the base learner outputs Ym and Yn such that: =

In order to effectively evaluate the performance of the base learners ahead of the ensemble models, we used the three commonly used measure of model performance viz. correlation coefficient (R-Square), root mean-squared error (RMSE) and mean absolute error (MAE) to evaluate the individual base learners.

(5)

n



4.2. Evaluation criteria

GR PHIE RHOB SWT RT MSFL NPHI CALI





i=1,...,N

i=1,...,N

(yim − Y m)(yin − Y n)

(yim − Y m)2



i=1,...,N

(7)

(yin − Y n)2

where Ym and Yn represent the continuous valued outputs of the models Rm and Rn . Ym and Yn are N-dimensional vectors with ym = m and yn = yn , yn , . . ., yn . y1m , y2m , . . ., yN N 1 2 The diversity of two predictors is inversely proportional to the correlation between them. Hence, two predictors with low DCC between them (high diversity) are preferred over those with high values. In this study, we defined high correlation to be greater than 0.7. MIE diversity measure is defined in terms of Eq. (7) as: I(Y m , Y n ) = −

1 log(1 − 2 ) 2

(8)

where , Ym and Yn remain as previously defined in Eq. (7). We also showed diversity by using graphical visualizations. We plotted the effect of different values of regularization parameter in terms of R-Square, RMSE and MAE to determine their degree of diversity. Those graphs that show a high degree of roughness or

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496 Table 3 Division of datasets into training and testing. Wells

Optimal Training/Testing Parameters for SVM 0.85

Porosity

Data size Training (70%) Testing (30%)

489

Actual Predicted

Permeability

1

2

3

1

2

3

415 291 124

285 200 85

23 16 7

355 249 106

477 334 143

387 271 116

0.8

non-smoothness would be an indication of high diversity. We were able to show pre-implementation and in situ diversity in Section 4.4, and post-implementation diversity in Section 5.

CC

0.75

0.7

4.4. Implementing the ensemble models 0.65

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

C Fig. 5. R-Square of different values of C with porosity Well 2.

Optimal Training/Testing Parameters for SVM 1 0.9 0.8 0.7

CC

The first step was to extract the various “expert opinions” on the value of C. Since none of the previous studies used our datasets, we could not use the same C values as we pointed out in Section 2.4. Hence, we derived a “digital” version of these opinions from our datasets. Taking into consideration all the divergent “expert views” in Section 2.4, we performed a sample run of a simple conventional SVM model using all the parameters that worked well in our previous studies with the same datasets and their 70:30 ratio stratification strategy [17,30,76] while varying the value of C between the extreme of 0 and 10,000. Following the common practice in petroleum engineering CI modeling, each dataset was divided into 70% training and 30% testing subsets using a randomized stratification approach [27,77,78]. With this, 70% of each dataset was used for training while the remaining 30% was used for testing. The training subset represents the cored section of the oil and gas reservoir with complete log-core data while the testing subset represents the uncored section with only the log data available while the core values are to be predicted. The division of the datasets is shown in Table 3. The optimal settings for the other tuning parameters used to determine the possible optimal values for C are:

0.6 0.5

Actual Predicted

0.4

• • • •

Error allowance, Lambda = 1e−7 Penalty for overfitting, epsilon = 0.1 Type of Kernel = Polynomial (Eq. (3)) Kernel step size = 0.2

0.3 0.2

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

C

The results of this expert opinion extraction process showing the performance of the SVM model with respect to the C values are shown in Figs. 4–9 for all the datasets respectively. The

Fig. 6. R-Square of different values of C with porosity Well 3.

Optimal Training/Testing Parameters for SVM 0.9

Optimal Training/Testing Parameters for SVM

Actual Predicted

1 Actual Predicted

0.95

0.88

0.86

0.9

CC

CC

0.85

0.84

0.8

0.82 0.75

0.8 0.7

0.65

0.78 0

1000

2000

3000

4000

5000

6000

7000

8000

9000 10000

C Fig. 4. R-Square of different values of C with porosity wEll 1.

0

1000

2000

3000

4000

5000

6000

7000

8000

C Fig. 7. R-Square of different values of C with permeability Well 1.

9000

490

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496 Table 4 Division of datasets into training and testing.

Optimal Training/Testing Parameters for SVM 0.95 0.9 0.85

CC

0.8 0.75

Actual Predicted

Data reference

Possible optimal C

Data 1 Well 1 Data 1 Well 2 Data 1 Well 3 Data 2 Well 1 Data 2 Well 2 Data 2 Well 3

900, 2800, 5600, 6800, 1500, 4500, 5000 1200, 3500, 5200, 7100, 8100 3600, 5100, 7600 500, 3500, 5900, 7000, 7700, 8200 800, 1800, 3700, 6500, 7600

Table 5 Division of datasets into training and testing.

0.7

Collapsed possible optimal C for ensemble

0.65 0.6 0.55

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

C Fig. 8. R-Square of different values of C with permeability Well 2.

figures showed that there is a lot of diversity in choosing different values for the regularization parameter as the SVM model behaved differently with each C value with respect to each dataset. Some points clearly showed the occurrence of overfitting. With this sample run, we were able to establish and confirm the existence of pre-implementation diversity. With our observation of this pre-implementation diversity of the effect of the regularization parameter in our sample run, the assertion of Cherkassky and Ma [79] that this parameter has negligible effect on the generalization performance needs to be further re-considered. From the pre-implementation diversity test results, the values of C corresponding to the points of optimal performance of the SVM sample run were extracted using the criterion of least overfitting – points with the least separation between the training and testing lines while maintaining optimality in generalization performance. Table 4 shows the extracted values of C meeting the stated criterion. The extracted values were then collapsed to the 10 shown in Table 5 to ensure simplicity of implementation [80–82] of the proposed ensemble model. We finally applied the six datasets, in turn, on our proposed SVM ensemble with the identified values of C serving as

Optimal Training/Testing Parameters for SVM 0.86 Actual Predicted

0.84 0.82 0.8

CC

0.78 0.76 0.74 0.72 0.7 0.68 0.66

0

1000

2000

3000

4000

5000

6000

7000

8000

C Fig. 9. R-Square of different values of C with permeability Well 3.

9000

500, 900, 1500, 2800, 3500, 4500, 5600, 6800, 7700, 8200

our “expert opinions”. We used a customized version of the original MATLAB-based Least-Square SVM code available in [83]. The choice of this version of SVM is due to reports of its excellent performance in literature [84,85]. After implementing the ensemble models as detailed in the algorithms, we carried out the in situ diversity test using the measures defined in Section 4.3. The base learners were paired consecutively and the diversity measures were applied on their predicted outputs. The results of this test are shown in Table 6. Other pairs were also considered but all gave results that are very close to those displayed. From the results, both the DCC and MIE of the paired base learners are low. This is another confirmation of the existence of diversity in our proposed ensemble model. We used the average of all the C values (4200) for the Tier-2 meta-model. The outputs of the Tier-1 models were combined using the Tier-2 model. With this, the training outputs of the Tier-1 models became the training input data for the Tier-2 model with the original target values remaining unchanged. The testing outputs of the Tier-1 models also became the testing input data for the Tier-2 model with the original testing target remaining kept for evaluating the generalization capability of the ensemble model. We then used the R-Square, RMSE and MAE criteria to obtain the overall performance of the ensemble model for the purpose of comparison. For the conventional bagging technique, we also created 10 instances of SVM and gave a bootstrap sample of each dataset to each instance for training. The remaining samples were used for testing the trained instances. The same LS-SVM code [83] but customized with other user-defined functions was used to implement the bagging algorithm. The performance of this model was also measured using the R-Square, RMSE and MAE of the individual results to obtain the average for the overall performance of the ensemble model. For the Random Forest, we customized the algorithm found in MATLAB CENTRAL [86] using other functions and toolboxes from the NETLAB repository [87]. Following the usual pattern, the performance of the algorithm with each dataset was measured using the R-Square, RMSE and MAE of the individual results. Since this is the fundamental ensemble algorithm, the Mean( ) combination rule has been internally applied on the output of the algorithm which automatically represents the overall performance of the ensemble Tree model. So, we did not make any modification to the combination segment of the code. After the implementation of the algorithms, we performed the post-implementation diversity test. This was done by visualizing the performance of each base learner of our proposed ensemble model with respect to the evaluation criteria. The graphical visualizations are shown in Figs. 10–15 for all the datasets. From the R-Square (CC), RMSE and MAE performance plots, the performance of the Tier-1 models was uniquely but consistently different from each other. The general irregularity non-smoothness of the plot

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

491

Table 6 Results of correlation and entropy diversity tests for all datasets. Diversity measure

Datasets

Base learners

Correlation

Por 1 Por 2 Por 3 Perm 1 Perm 2 Perm 3

0.193 0.249 −0.629 0.616 0.308 0.686

−0.075 0.145 0.173 0.661 0.284 0.088

0.346 0.091 0.202 0.673 0.438 0.699

0.176 0.236 0.350 0.676 0.181 0.676

0.069 0.189 0.077 0.461 0.369 0.379

Entropy

Por 1 Por 2 Por 3 Perm 1 Perm 2 Perm 3

0.008 0.014 0.109 0.104 0.022 0.209

0.001 0.005 0.154 0.125 0.018 0.230

0.028 0.002 0.009 0.131 0.046 0.146

0.007 0.012 0.028 0.192 0.007 0.218

0.001 0.008 0.150 0.052 0.032 0.172

1

2

3

4

lines is a confirmation of the heterogeneity in the performance of the Tier-1 models, hence confirming the existence of high diversity. The result of these three levels of diversity test (pre-, in situ, and post-implementation) is an indication that SVM with the diverse regularization parameter is an ideal candidate for ensemble modeling.

6

7

When the performance of our proposed ensemble model was compared with those of the conventional SVM model, SVM with

Comparative RMSE for the SVM Ensemble

9

10

Comparative MAE for the SVM Ensemble

1

11

3

0.95

10

2.8 2.6

9

0.9

8

the bagging method and the Random Forest technique, the results obtained are shown in Fig. 16 through 20. In terms of the R-Square criterion, Fig. 16 shows that our proposed stack generalization ensemble model outperformed the others with the highest correlation in most of the six cases. In particular, our proposed model proved to be superior to the conventional SVM and Random Forest techniques on all the datasets. However, the proposed model outperformed the SVM bagging method in four cases (Data 1 Well 1, Data 1 Well 2, Data 2 Well 1 and Data 2 Well 3) while exhibiting stiff competitions in two cases (Data 1 Well 3 and Data 2 Well 2). Between SVM Bagging and Random Forest, the former proved to be

5. Modeling results and discussion

Comparative CC for the SVM Ensemble

5

2.4 8

0.8

MAE

CC

RMSE

0.85 7

2.2 2

6 1.8 0.75

5

0.7

4

0.65

3

1

2

3

4

5

6

7

8

9

10

1.6 1.4

1

2

3

4

5

Base Learners

6

7

8

9

10

1

2

3

4

Base Learners

5

(b)

(a)

6

7

8

9

10

9

10

Base Learners

(c)

Fig. 10. Comparison of R-Square, RMSE and MAE of the SVM base learners for data 1 Well 1.

Comparative CC for the SVM Ensemble

Comparative RMSE for the SVM Ensemble

Comparative MAE for the SVM Ensemble

0.9

8

6.4

0.88

7.8

6.2

7.6

6

7.4

5.8

7.2

5.6

0.86 0.84

0.8

MAE

RMSE

CC

0.82

7

5.4

0.78 6.8

0.76 0.74 0.72 0.7

5.2

6.6

5

6.4

4.8

6.2 1

2

3

4

5

6

Base Learners

(a)

7

8

9

10

4.6 1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

Base Learners

Base Learners

(b)

(c)

Fig. 11. Comparison of R-Square, RMSE and MAE of the SVM base learners for data 1 Well 2.

7

8

492

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496 Comparative CC for the SVM Ensemble

Comparative RMSE for the SVM Ensemble

1

Comparative MAE for the SVM Ensemble

5.5

4

5

3.5

0.98 4.5

3 4

0.94

0.92

2.5

3.5

MAE

RMSE

CC

0.96

3

2

2.5 1.5 2

0.9

1

1.5 0.88

1

2

3

4

5

6

7

8

9

1

10

1

2

3

4

5

Base Learners

6

7

8

9

0.5 1

10

2

3

4

Base Learners

(a)

5

6

7

8

9

10

9

10

Base Learners

(b)

(c)

Fig. 12. Comparison of R-Square, RMSE and MAE of the SVM base learners for data 1 Well 3.

Comparative RMSE for the SVM Ensemble

Comparative MAE for the SVM Ensemble 0.62

0.86

0.74

0.6

0.85

0.72

0.58

0.84

MAE

0.76

RMSE

CC

Comparative CC for the SVM Ensemble 0.87

0.7

0.56

0.83

0.68

0.54

0.82

0.66

0.52

0.81

0.64 1

2

3

4

5

6

7

8

9

10

1

2

3

4

Base Learners

5

6

7

8

9

0.5

10

1

2

3

4

5

6

Base Learners

Base Learners

(b)

(c)

(a)

7

8

Fig. 13. Comparison of R-Square, RMSE and MAE of the SVM base learners for data 2 Well 1.

better in five cases out of the six. The reason for the lower R-Square of SVM Bagging than that of the Random Forest on the Site 1 Well 2 could not be explained at the moment. We treated it as a lone and rare case that needs to be further investigated in our future work with more and different datasets. With the RMSE criterion, the same trend of the superior performance of our proposed model over the others is shown in Fig. 17

Comparative CC for the SVM Ensemble

for porosity datasets and Fig. 18 for permeability. Our proposed ensemble model gave the least RMSE for all the datasets. It was interesting, however, to observe that the better performance of Random Forest in terms of R-Square over SVM bagging with Site 1 Well 2 in Fig. 16 has been nullified by the lower RMSE of SVM Bagging than Random Forest (Fig. 17). That does not however preclude the necessity of conducting more studies with more and different

Comparative RMSE for the SVM Ensemble

0.8

Comparative MAE for the SVM Ensemble 0.7

0.86

0.79

0.68

0.84 0.78

0.66 0.82

0.77

MAE

RMSE

CC

0.64 0.76

0.8

0.62

0.75 0.78 0.6

0.74 0.76

0.58

0.73 0.72

1

2

3

4

5

6

7

8

9

10

0.74

1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

Base Learners

Base Learners

Base Learners

(a)

(b)

(c)

Fig. 14. Comparison of R-Square, RMSE and MAE of the SVM base learners for data 2 Well 2.

7

8

9

10

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496 Comparative CC for the SVM Ensemble

493

Comparative RMSE for the SVM Ensemble

0.84

Comparative MAE for the SVM Ensemble

0.95

0.72

0.82

0.7 0.9

0.8

0.68

0.78

0.66 0.85 0.64

0.74

MAE

CC

RMSE

0.76

0.62

0.8 0.72

0.6

0.7

0.58

0.75

0.68 0.66

0.56

1

2

3

4

5

6

7

8

9

10

0.7

1

2

3

4

Base Learners

5

6

7

8

9

10

0.54

1

2

Base Learners

(a)

(b)

3

4

5

6

7

8

9

10

Base Learners

(c)

Fig. 15. Comparison of R-Square, RMSE and MAE of the SVM base learners for data 2 Well 3.

sets of data to further investigate the comparative performance of the two techniques. In terms of MAE, our proposed ensemble model showed superiority over the others for all datasets (Figs. 19 and 20). The SVM bagging has lower errors than the Random Forest technique in five out of the six cases while there was stiff competition between SVM bagging and the proposed ensemble model on all datasets except Data 1 Well 1. In the overall evaluation, we posit that our proposed model performed the best with few cases of stiff competition

Fig. 16. R-Square performance of the 3 techniques (all datasets).

Fig. 17. RMSE comparative performance of the 3 Techniques (porosity datasets).

with the SVM bagging method. Hence, both SVM bagging and the stacked generalization ensemble model proved to be better models for successful and improved petroleum reservoir modeling and prediction than the conventional SVM model and the Random Forest technique. This further confirmed the report of the robustness and scalability of SVM in handling data of different sizes and dimensionality [4,17,45,47,48,51,84] and the limitation of Decision Tree, which forms the basis of the Random Forest technique, in handling data of small size and high dimensionality [66–68].

Fig. 18. RMSE comparative performance of the 3 techniques (permeability datasets).

Fig. 19. MAE comparative performance of the 3 techniques (porosity datasets).

494

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

gain immensely from the application of this new advancement in machine learning technology. It will be well appreciated in the petroleum industry since a marginal improvement in the prediction of reservoir properties, especially porosity and permeability, will lead to huge increase in the successful exploration, drilling tasks and production of energy. With the success of this study, we have been motivated to implement our proposed ensemble algorithm in Decision Trees using different number of tree splits as the basis for diversity in the face of diverse “expert opinions”. The result of that will be compared with the existing bagging method implemented in Random Forest. Application of ensemble possibilities of other techniques in petroleum reservoir characterization and modeling will be investigated. Fig. 20. MAE comparative performance of the 3 techniques (permeability datasets).

6. Conclusion In this study, we proposed a novel application of the stacked generalization ensemble model of SVM based on diverse regularization parameter. Different views of experts on the importance and sensitivity of this parameter of SVM, giving credence to the need for great caution in setting its value, were presented. Using three levels of diversity test: pre-implementation, in situ, and postimplementation, we showed the importance of this parameter to the performance of the SVM model and used that to establish the existence of the required diversity of “expert opinions” for the implementation of our proposed ensemble technique. To prove the superior performance of our algorithm, we also implemented another SVM ensemble model that is based on the conventional bagging method. These two methods were compared with Random Forest, a traditional and fundamental ensemble technique that is based on the combination of several instances of Decision Trees and the conventional SVM technique. All the four techniques were applied on six datasets of petroleum reservoirs obtained from different geological formations in the prediction of porosity and permeability. The results of the techniques were evaluated using the standard correlation coefficient, root mean square error and mean absolute error. Based on these metrics, we evaluated the comparative performance of the implemented techniques. A rigorous comparison of the three ensemble models with the conventional SVM technique led to the following conclusions: • The existence of high diversity in the use of SVM with diverse regularization parameter presented it as a good candidate for ensemble modeling. • Our proposed stacked generalization ensemble model outperformed the other ensemble models with few occurrences of stiff competition with SVM bagging. • The observed best performance of our proposed model is based on its demonstration of superiority in all the cases of the evaluation criteria. • On the average, the SVM ensemble with the conventional bagging algorithm performed better than the Random Forest technique. • We further confirmed in this study that the Random Forest technique is susceptible to overfitting with small datasets. This agrees with previous reports [66–68]. • The proposed ensemble model, like the conventional SVM, demonstrated the capability to handle both small and large datasets. This study has successfully showed that there is great potential for the proposed ensemble learning paradigm in petroleum reservoir characterization and the petroleum industry stands to

Acknowledgment The authors would like to thank the Universiti Malaysia Sarawak and King Fahd University of Petroleum and Minerals for providing the resources used in the conduct of this study. References [1] M. Re, G. Valentini, Simple ensemble methods are competitive with stateof-the-art data integration methods for gene function prediction, in: JMLR: Workshop and Conference Proceedings, Mach. Learn. Syst. Biol. 8 (2010) 98–111. [2] A. Chandra, X. Yao, Evolving hybrid ensembles of learning machines for better generalisation, Neurocomputing 69 (2006) 686–700. [3] L. Nanni, A. Lumini, An ensemble of support vector machines for predicting virulent proteins, Expert Syst. Appl. 36 (2009) 7458–7462. ˜ [4] M.L. Martin, D. Santos-Munoz, F. Valero, A. Morata, Evaluation of an ensemble precipitation prediction system over the Western Mediterranean area, Atmos. Res. 98 (2010) 163–175. [5] I. Zaier, C. Shu, T.B.M.J. Ouarda, O. Seidou, F. Chebana, Estimation of ice thickness on lakes using artificial neural network ensembles, J. Hydrol. 383 (3–4) (2010) 330–340. [6] J. Sun, H. Li, Financial distress prediction using support vector machines: ensemble vs. individual, Appl. Soft Comput. 12 (8) (2012) 2254–2265. [7] R. Polikar, Scholarpedia 4 (1) (2009) 2776. [8] J.K. Ali, Neural networks: a new tool for the petroleum industry? in: Proceedings of Society of Petroleum Engineers European Petroleum Computer Conference, Aberdeen, UK, 1994, pp. 217–231. [9] L. Jong-Se, Reservoir properties determination using fuzzy logic and neural networks from well data in offshore Korea J. Pet. Sci. Eng. 49 (2005) 182–192. [10] A. Abdulraheem, E. Sabakhi, M. Ahmed, A. Vantala, I. Raharja, G. Korvin, Estimation of permeability from wireline logs in a middle eastern carbonate reservoir using fuzzy logic, in: Proceedings The 15th SPE Middle East Oil and Gas Show and Conference, Bahrain, 11–14 March, 2007. [11] D. Kaviani, T.D. Bui, J.L. Jensen, C.L. Hanks, The application of artificial intelligence neural networks with small data sets: an example for analysis of fracture spacing in the Lisbourne formation, northeastern Alaska, SPE J. Reserv. Eval. Eng. 11 (3) (2008) 598–605. [12] A. Khoukhi, M. Oloso, A. Abdulraheem, M. El-Shafei, Optimized adaptive neural networks for viscosity and gas/oil ratio curves prediction, in: Proceedings IASTED International Conference, Banff, Canada, 2010, pp. 14–17. [13] S. Abe, Fuzzy LP-SVMs for multiclass problems, in: Proceedings of the European Symposium on Artificial Neural Networks, 2004, pp. 429–434. [14] S. Mohsen, A. Morteza, Y.V. Ali, Design of neural networks using genetic algorithm for the permeability estimation of the reservoir, J. Pet. Sci. Eng. 59 (2007) 97–105. [15] M.B. Shahvar, R. Kharrat, R. Mahdavi, Incorporating fuzzy logic and artificial neural networks for building hydraulic unit-based model for permeability prediction of a heterogeneous carbonate reservoir, in: Proceedings International Petroleum Technology Conference, Doha, Qatar, 7–9 December, 2009. [16] T. Weldu, S. Ghedan, O. Al-Farisi, Hybrid AI and conventional empirical model for improved prediction of log-derived permeability of heterogeneous carbonate reservoir, in: Proceedings prepared of the Society of Petroleum Engineers Production and Operations Conference and Exhibition, Tunis, Tunisia, 8–10 June, 2010. [17] F. Anifowose, J. Labadin, A. Abdulraheem, A hybrid of functional networks and support vector machine models for the prediction of petroleum reservoir properties, in: Proceedings of 11th International Conference on Hybrid Intelligent Systems, IEEExplore, 2011, pp. 85–90. [18] F. Anifowose, J. Labadin, A. Abdulraheem, Prediction of petroleum reservoir properties using different versions of adaptive neuro-fuzzy inference

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

[19] [20]

[21] [22]

[23]

[24] [25] [26]

[27]

[28]

[29]

[30]

[31] [32] [33] [34] [35]

[36]

[37] [38] [39]

[40]

[41] [42] [43]

[44]

[45] [46] [47]

[48]

[49]

[50] [51]

system hybrid models, Int. J. Comput. Inform. Syst. Ind. Manag. Appl. 5 (2013) 413–426. L. Rokach, Pattern Classification Using Ensemble Methods, World Scientific Publishing Co., 2009. C. Caragea, J. Sinapov, A. Silvescu, D. Dobbs, V. Honavar, Glycosylation site prediction using ensembles of support vector machine classifiers, BMC Bioinform. 8 (438) (2007), doi:10.1186/1471-2105-8-438. S. Chen, W. Wang, H. Zuylen, Construct support vector machine ensemble to detect traffic incident, Exp. Syst. Appl. 36 (2009) 10976–10986. J. Wu, J.M. Rehg, Object detection, in: C. Zhang, Y. Ma (Eds.), Ensemble Machine Learning: Methods and Applications, Springer Science+Business Media, LLC, 2012, http://dx.doi.org/10.1007/978-1-4419-9326-7 8. V. Landassuri-Moreno, J.A. Bullinaria, Neural network ensembles for time series forecasting, in: Proceedings of GECCO’09, Montréal Québec, Canada, 2009, pp. 8–12. L. Yu, S. Wang, K.K. Lai, Forecasting crude oil price with an EMD-based neural network ensemble learning paradigm, Energy Econ. 30 (2008) 2623–2635. L. Breiman, Bagging predictors, Mach. Learn. 24 (2) (1996) 123–140. G. Liang, X. Zhu, C. Zhang, An empirical study of bagging predictors for different learning algorithms, in: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 2011, pp. 1802–1803. T. Helmy, F. Anifowose, K. Faisal, Hybrid computational models for the characterization of oil and gas reservoirs, Int. J. Exp. Syst. Appl. 37 (2010) 5353–5363. S. Mohaghegh, S. Ameri, Artificial neural network as a valuable tool for petroleum engineers, in: Unsolicited paper for Society of Petroleum Engineers: 1–6, 1995. F. Anifowose, A. Abdulraheem, A functional networks-type-2 fuzzy logic hybrid model for the prediction of porosity and permeability of oil and gas reservoirs, in: Proceedings 2nd International Conference on Computational Intelligence, Modeling and Simulation, IEEEXplore, 2010, pp. 193–198. F. Anifowose, A. Abdulraheem, Fuzzy logic-driven and SVM-driven hybrid computational intelligence models applied to oil and gas reservoir characterization, J. Nat. Gas Sci. Eng. 3 (3) (2011) 505–517. H. Kim, S. Pang, H. Je, D. Kim, S.Y. Bang, Constructing support vector machine ensemble, Pattern Recognit. 36 (2003) 2757–2767. Y. Peng, A novel ensemble machine learning for robust microarray data classification, Comput. Biol. Med. 36 (2006) 553–573. Y. Chen, Y. Zhao, A novel ensemble of classifiers for microarray data classification, Appl. Soft Comput. 8 (2008) 1664–1669. G. Valentini, M. Muselli, F. Ruffino, Cancer recognition with bagged ensembles of support vector machines, Neurocomputing 56 (2004) 461–466. K.W.D. Bock, D.V. Poel, Ensembles of probability estimation trees for customer churn prediction, in: N. Garcıa-Pedrajas, al. et (Eds.), IEA/AIE 2010, Part II, LNAI 6097, Springer-Verlag, 2010, pp. 57–66. D. Pardoe, M. Ryoo, R. Miikkulainen, Evolving neural network ensembles for control problems, in: Proceedings of the GECCO’05, Washington, DC, USA, 2005, pp. 25–29. L. Baker, D. Ellison, Optimisation of pedotransfer functions using an artificial neural network ensemble method, Geoderma 144 (2006) 212–224. L.L. Minku, X. Yao, DDD: a new ensemble approach for dealing with concept drift, IEEE Trans. Knowl. Data Eng. 24 (4) (2012) 619–633. M.D. Felice, X. Yao, Short-term load forecasting with neural network ensembles: a comparative study, IEEE Comput. Intell. Mag. 6 (3) (2011) 47–56. Z. Wang, V. Palade, Y. Xu, Neuro-fuzzy ensemble approach for microarray cancer gene expression data analysis, in: Proceedings 2006 International Symposium on Evolving Fuzzy Systems, IEEEXplore, 2006, pp. 241–246. A. Tsymbal, S. Puuronen, D.W. Patterson, Ensemble feature selection with the simple Bayesian classification, Inf. Fusion 4 (2003) 87–100. P.D. Castro, G.P. Coelho, M.F. Caetano, F.J.V. Zuben, in: C. Jacob, et al. (Eds.), ICARIS 2005, LNCS 3627, 2005, pp. 469–482. Z. Bray, P.O. Kristensson, Using ensembles of decision trees to automate repetitive tasks in web applications, in: Proceeding of the EICS’10, Berlin, Germany, June 19–23, 2010. M. Heeswijk, Y. Miche, T. Lindh-Knuutila, P.A.J. Hilbers, T. Honkela, E. Oja, A. Lendasse, Adaptive ensemble models of extreme learning machines for time series prediction, in: C. Alippi, et al. (Eds.), ICANN 2009, Part II, LNCS 5769, 2009, pp. 305–314. C.J. Burges, A tutorial on support vector machines for pattern recognition, Data Min. Knowl. Discov. 2 (1998) 121–167. V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, 1995. N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, 1st ed., Cambridge University Press, UK, 2000. ˜ J. Taboada, J.M. Matías, C. Ordónez, P.J. García, Creating a quality map of a slate deposit using support vector machines, J. Comput. Appl. Maths. 20 (4) (2007) 84–94. Y. Xing, X. Wu, Z. Xu, Multiclass least squares auto-correlation wavelet support vector machines, Int. J. Innov. Comput. Inf. Control Express Lett. 2 (4) (2008) 345–350. E. Alpaydin, Introduction to Machine Learning, 2nd ed., The MIT Press, 2010, pp. 224. V. Kecman, Learning and Soft Computing Support Vector Machines, Neural Networks, and Fuzzy Logic Models, MIT Press, 2001, pp. 541.

495

[52] T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms, The Springer International Series in Engineering and Computer Science, 2002, pp. 40. [53] J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press, 2004. [54] V.S. Cherkassky, P. Mulier, Learning from Data: Concepts, Theory and Methods, 2nd ed., John Wiley & Sons, 2007. [55] G. Brown, J.L. Wyatt, P. Tino, Managing diversity in regression ensembles, J. Mach. Learn. Res. 6 (2005) 1621–1650. [56] R. Polikar, Ensemble based systems in decision making, IEEE Circuits Syst. Mag. 3 (2006) 21–45. [57] A. Liaw, M. Wiener, Classification and regression by random forest, R News 2 (3) (2002) 18–22. [58] R. Polikar, Ensemble learning, in: C. Zhang, Y. Ma (Eds.), Ensemble Machine Learning: Methods and Applications 8, Springer Science+Business Media, LLC, 2012, pp. 1–34. [59] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32. [60] T.K. Ho, Random decision forest, in: Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August, 1995, pp. 278–282. [61] T.K. Ho, The random subspace method for constructing decision forests, IEEE Trans. Pattern Anal. Mach. Intell. 20 (8) (1998) 832–844. [62] P. Sherrod, DTREG Predictive Modeling Software, 2008, 324 pp., available from: www.dtreg.com [63] H. Park, S. Kwon, K. Hyuk-Chul, Complete gini-index text (GIT) feature-selection algorithm for text classification, in: 2nd International Conference on Software Engineering and Data Mining, 2010, pp. 366–371. [64] D.G. Leibovici, L. Bastin, M. Jackson, Higher-order co-occurrences for exploratory point pattern analysis and decision tree clustering on spatial data, Comput. Geosci. 37 (2011) 382–389. [65] R. Caruana, N. Karampatziakis, A. Yessenalina, An empirical evaluation of supervised learning in high dimensions, in: Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, 2008. [66] A. Altmann, L. Tolosi, O. Sander, T. Lengauer, Permutation importance: a corrected feature importance measure, Bioinformatics 26 (10) (2010) 1340–1347. [67] H. Deng, G. Runger, E. Tuv, Bias of importance measures for multi-valued attributes and solutions, in: Proceedings of the 21st International Conference on Artificial Neural Networks (ICANN), 2011, pp. 293–300. [68] L. Tolosi, T. Lengauer, Classification with correlated features: unreliability of feature ranking and solutions, Bioinformatics 27 (2011) 1986–1994. [69] A. Tsymbal, M. Pechenizkiy, P. Cunningham, Diversity in search strategies for ensemble feature selection, Information Fusion 6 (1) (2005) 83–98, Special issue on diversity in multiple classifier Systems. [70] L.L. Minku, A.P. White, X. Yao, The impact of diversity on online ensemble learning in the presence of concept drift, IEEE Trans. Knowl. Data Eng. 22 (5) (2010) 730–742. [71] E.K. Tang, P.N. Suganthan, X. Yao, An analysis of diversity measures, Mach. Learn. 65 (2006) 247–271. [72] T. Lofstrom, U. Johansson, H. Bostrom, On the use of accuracy and diversity measures for evaluating and selecting ensembles of classifiers, in: Seventh International Conference on Machine Learning and Applications (ICMLA), 2008, pp. 127–132. [73] S. Wang, X. Yao, Relationships between diversity of classification ensembles and single-class performance measures, IEEE Trans. Knowl. Data Eng. 25 (1) (2013) 206–219. [74] H. Dutta, Measuring diversity in regression ensembles, in: B. Prasad, P. Lingras, A. Ram (Eds.), Proceedings of the 4th Indian International Conference on Artificial Intelligence (IICAI 2009), Tumkur, Karnataka, India, 16–18 December, 2009, pp. 2220–2236. [75] L.I. Kuncheva, C.J. Whitaker, Ten measures of diversity in classifier ensembles: limits for two classifiers, in: A DERA/IEE Workshop on Intelligent Sensor Processing, 2001, 10/1–10/10. [76] F. Anifowose, A. Abdulraheem, Prediction of porosity and permeability of oil and gas reservoirs using hybrid computational intelligence models, in: Proceedings SPE North Africa Technical Conference and Exhibition (NATC 2010), Cairo, Egypt, 2010. [77] E. Ip, I. Cadez, P. Smyth, Psychometric methods of latent variable modeling, in: N. Ye (Ed.), The Handbook of Data Mining, Lawrence Erlbaum Associates, 2003, p. 238. [78] A. Khoukhi, S. Albukhitan, PVT properties prediction using hybrid genetic neuro-fuzzy systems, Int. J. Oil Gas Coal Technol. 4 (1) (2011) 47–63. [79] V. Cherkassky, Y. Ma, Selection of Meta-Parameters for Support Vector Regression, in: Proceedings of the ICANN, 2002, pp. 687–693. [80] A.W.F. Edwards, Occam’s bonus, in: A. Zellner, H.A. Keuzenkamp, M. McAleer (Eds.), Simplicity, Inference and Modeling: Keeping it Sophisticatedly Simple, Cambridge University Press, UK, 2004, pp. 128–132. [81] B. Hamming, What explains complexity? in: A. Zellner, H.A. Keuzenkamp, M. McAleer (Eds.), Simplicity, Inference and Modeling: Keeping it Sophisticatedly Simple, Cambridge University Press, UK, 2004, pp. 120–127. [82] H.A. Keuzenkamp, M. Mcaleer, A. Zellner, The enigma of simplicity, in: A. Zellner, H.A. Keuzenkamp, M. McAleer (Eds.), Simplicity, Inference and Modeling: Keeping it Sophisticatedly Simple, Cambridge University Press, UK, 2004, pp. 1–10. [83] Least Squares SVM (LS-SVM), Basic Version available online, http://www.esat.kuleuven.be/sista/lssvmlab/ (accessed 25.12.12).

496

F. Anifowose et al. / Applied Soft Computing 26 (2015) 483–496

[84] X. Peng, Y. Wang, A normal least squares support vector machine (NLS-SVM) and its learning algorithm Neurocomputing 72 (2009) 3734–3741. [85] Bailer-Jones C.A.L., Statistical Methods, A Computer Course, Available from: http://www.mpiahd.mpg.de/ calj/statistical methods ss2011/lectures/ 05 regressi on.pdf (accessed 21.12.12).

[86] MATLAB CENTRAL, Random Forest, Available at: http://www.mathworks.com/ matlabcentral/fileexchange/31036-random-forest (accessed 04.12.12). [87] Netlab Toolbox, Neural Computing Research Group, Information Engineering, Aston University, Birmingham, United Kingdom, 2012, Available from: http://www.ncrg.aston.ac.uk/netlab (accessed 11.12.12).