Application of the Neural Decision Tree approach for prediction of petroleum production

Application of the Neural Decision Tree approach for prediction of petroleum production

Journal of Petroleum Science and Engineering 104 (2013) 11–16 Contents lists available at SciVerse ScienceDirect Journal of Petroleum Science and En...

353KB Sizes 0 Downloads 23 Views

Journal of Petroleum Science and Engineering 104 (2013) 11–16

Contents lists available at SciVerse ScienceDirect

Journal of Petroleum Science and Engineering journal homepage: www.elsevier.com/locate/petrol

Application of the Neural Decision Tree approach for prediction of petroleum production X. Li, C.W. Chan n, H.H. Nguyen Energy Informatics Laboratory, Faculty of Engineering and Applied Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2

art ic l e i nf o

a b s t r a c t

Article history: Received 13 February 2012 Accepted 19 March 2013 Available online 2 April 2013

Accurate predictions of oil production from wells are important for cost-effective operations in the petroleum industry. Such a prediction can assist petroleum engineers in project design, facilities construction scheduling, economic forecasting, and environment management. However, it is difficult to obtain accurate predictions of oil production due to the complex subsurface conditions of reservoirs. Reservoir engineers typically employ curve fitting techniques for predicting primary production of wells based on existing production data. Instead of using this approach, application of some artificial intelligence techniques for production prediction is explored in this paper. The artificial neural network (ANN) approach is a mathematical modeling technique inspired by biological neural networks. The ANN consists of an interconnected group of artificial neurons, which process information via a learning phase between inputs and outputs so as to find patterns in data. A decision tree learning algorithm such as C4.5 usually considers one variable at a time and ignores interdependencies among input attributes, which reduces its model accuracy. On the other hand, an enhanced decision tree learning approach called Neural Decision Tree (NDT) takes interdependencies among input attributes into consideration and generates a decision tree for prediction of petroleum production. This paper presents a comparison of prediction results produced from the three machine intelligence approaches of C4.5 model, NDT model and the ANN model. The results show that the NDT model can significantly improve upon the classification accuracy of the C4.5 algorithm. When compared to the ANN approach, the NDT model has a lower classification rate in general but is better able to describe classes with low number of instances. & 2013 Published by Elsevier B.V.

Keywords: artificial neural network attribute dependency data mining decision tree petroleum production prediction

1. Introduction Estimations of future production and potential recoverable oil reserve of petroleum wells are important for cost-effective operations in the petroleum industry. However, the estimation task is difficult due to uncertain underground conditions. Reservoir engineers typically use time-consuming simulations as the basis for making predictions of oil production. Meanwhile, there is a huge amount of under-utilized data readily available from within companies and from public sources, which can be used for building a predictive model for oil production. In this study, the historical data sets of oil production are analyzed to classify production patterns based on geological variables. Previous attempts at data analysis for predicting reservoir production often adopt the approaches of decline curve fitting and artificial neural networks. Most of the existing decline curve analysis techniques are based on the empirical equations including exponential, hyperbolic, and harmonic equations (Li and Horne,

n

Corresponding author. Tel.: +1 306 585 5225. E-mail address: [email protected] (C.W. Chan).

0920-4105/$ - see front matter & 2013 Published by Elsevier B.V. http://dx.doi.org/10.1016/j.petrol.2013.03.018

2003). However, the problem with this technique is that it is difficult to identify which equation describes the production of a reservoir. In addition, a single curve often cannot describe the entire life of a reservoir, and curve fitting not only makes the matching process difficult but also results in unreliable predictions (El-Banbi and Wattenbarger, 1996). Newer attempts using artificial neural networks (Nguyen et al., 2004; Weiss et al., 2002) managed to fit the data more closely but suffered from lack of interpretability. Decision tree learning (DTL) is one of the most prevalent analysis methods used in classification problems. Many researchers have utilized DTL for applications in petroleum engineering with some level of success. Perez et al. (2005) used decision trees to classify the data for permeability predictions based on well logs. Jensen (1998) applied decision tree analysis to estimate the range of uncertainty in the reservoir production prognosis. Most DTL algorithms implement univariate attribute testing at each node, but this approach can encounter problems when the input attributes are interdependent (Lee and Yen, 2002; Lee et al., 2006; Lee et al., 2009; Yen and Lee, 2011). For predicting petroleum production, a number of core analysis variables have been identified as significant factors but they have some interdependencies among them. For example, permeability,

12

X. Li et al. / Journal of Petroleum Science and Engineering 104 (2013) 11–16

which is a measure of how easily oil can pass through a porous medium, has close correlation with porosity, which is a measure of the void space in the reservoir rocks. This kind of interdependencies can alter the true underlying relationship between the input and output variables, and give a falsified model. However, it is not possible to preprocess the data and eliminate the interdependencies because it is unknown what type of dependencies exists between the two variables. This issue is dealt with in the NDT model (Lee and Yen, 2002; Lee et al., 2006; Lee et al., 2009; Yen and Lee, 2011), which uses ANNs to extract and eliminate the underlying attribute dependencies that are not directly observable by humans. The objective of this study is twofold: first, the NDT model is evaluated as a data analysis tool and its applicability assessed for modeling petroleum production data; secondly, applications of the NDT and ANN approaches for prediction of petroleum production are compared in terms of classification accuracy. This paper is organized as follows. Section 2 introduces the NDT framework, which includes its structure and processing procedure. Section 3 evaluates performance of the NDT approach using three different data sets obtained from the UCI Machine Learning Repository (Blake and Merz, 1987). The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators. The repository is widely used by the research community involved in machine learning for empirical analysis of machine learning algorithms (Blake and Merz, 1987). Section 4 presents application of the NDT model for oil production prediction, and the results are compared to those obtained using the C4.5 algorithm. Section 5 explains our adoption of the ANN approach for this modeling task, and describes a comparison of the prediction results generated by the three methods of NDT, C4.5, and ANN. Section 6 presents the conclusion and suggests some directions for future work.

2. The Neural Decision Tree framework The Neural Decision Tree (NDT) model proposed by Lee and Yen (2002)integrates a neural network and a decision tree into a single entity. It can be interpreted as a set of learning procedures based on the theoretical foundations of Decision Tree and Neural Network learning. The main idea of Lee and Yen (2002) is to use a neural network to eliminate interdependencies among input variables first and then feed the newly transformed input variables to a decision tree learning process for classification. Hence, the NDT algorithm begins with acceptance of training data as input to the neural network model. The generated output is then passed on for decision tree construction, and the resulting rules will be the net output of the NDT model. Specifically, the model proposed by Lee and Yen (2002) used back-propagation neural networks (Chauvin and Rumelhart, 1995) and the C4.5 decision tree learning algorithm (Quinlan, 1993). The feed-forward back-propagation model uses the gradient descent algorithm, where inputs are propagated from the input to the output layer without feedback, followed by the back-propagation of weight modifications to reduce error. In an NDT model, the number of hidden neurons is equal to the number of input neurons. The C4.5 algorithm is based on the concept of class entropy, and information gain of how well an attribute classifies the dataset. Details on the back-propagation and C4.5 algorithms can be found in Chauvin and Rumelhart (1995) and Quinlan (1993). The NDT framework can be applied to both numerical and categorical data but the method of handling the two types of data are different. Since our application involves only numerical attributes, only the numerical part of the framework is discussed here.

A complete description of the NDT approach can be found in Lee and Yen (2002), Yen and Lee (2011), Lee et al. (2006) and Lee et al. (2009). For numerical attributes, Lee and Yen suggested that the first hidden layer of a feed-forward back-propagation neural network extracts input interdependencies automatically. These are provided as input to the hidden neurons, and the original input variables undergo a linear transformation to become a set of new independent input variables. The new set of variables is then fed to the C4.5 program so that the data can be classified and a decision tree and rules generated. A detailed example of how the method works can be found in Section 4. Before applying the NDT framework to the dataset on petroleum production prediction, it was applied to three data sets obtained from the UCI Machine Learning Repository (Blake and Merz, 1987) so as to verify that the NDT method performs better than the C4.5 method. The verification process is presented in the next section.

3. Verification of the NDT approach using data from the UCI Machine Learning Repository As a step towards measuring performance of the NDT model as compared to that of the C4.5 algorithm, the NDT model was implemented and applied to three numerical data sets from the UCI Machine Learning Repository (Blake and Merz, 1987). The data sets were chosen because they are similar to the oil production dataset in the following ways: (1) they involve only numerical values, (2) the size of the data sets is relatively small with a few hundred data points, and (3) the number of attributes in each set is about 10 or less. In each dataset, instances with missing attributes have been removed, and details on the data sets are shown in Table 1. The NDT model and the original C4.5 model were applied to the data sets separately. A 10-fold cross-validation was used in all the applications, wherein the 10 samples are randomly chosen without replacement from the data set. Each run of the cross-validation process used nine of these samples to create a classifier model, and the 10th sample was used as a holdout test set. The process was repeated 10 times, so that each instance was used exactly once for both training and testing. The average classification accuracy was measured as an indication of the overall performance of the model. The results from modeling the data sets are shown in Table 2. Since the sampling used in cross-validation is randomly chosen for each run, only the results with the best-observed classification rates are included in the table. The classification accuracy is defined as the percentage of data tuples correctly classified by the model (Han and Kamber, 2006), and this is adopted as the basis for result comparison. It can be seen from Table 2 that the NDT model shows an improvement in classification accuracy over the C4.5 algorithm in both the pruned and unpruned cases. The values of classification accuracy improved slightly in the “Glass” and “Bupa” data sets (less than 2% improvement) but significantly in the “Balance-Scale” data set (from 79.2% and 76.64% for the C4.5

Table 1 Description of data sets from the UCI Repository. Data set

Number of instances after removal of ones with missing values

Number of input attributes

Glass Balance-scale Bupa

214 625 345

10 4 7

X. Li et al. / Journal of Petroleum Science and Engineering 104 (2013) 11–16

13

Table 2 Result summary for three data sets from the UCI Repostiory. Dataset

Measure

C4.5 without pruning

C4.5 with pruning

NDT without pruning

NDT with pruning

Glass

Tree size (# of nodes) Number of rules Test set size Correct classification rate (%) Mean absolute error Mean squared error

30 59 214 66.35 0.1029 0.2936

30 59 214 65.88 0.1062 0.2922

28 55 214 68.70 0.0953 0.2826

28 55 214 68.22 0.0971 0.284

Balance-scale

Tree size (# of nodes) Number of rules Test set size Correct classification rate (%) Mean absolute error Mean squared error

60 119 625 79.2 0.1595 0.3487

58 115 625 76.64 0.188 0.3688

18 35 625 94.88 0.0433 0.1815

18 35 625 94.56 0.0466 0.1858

Bupa

Tree size (# of nodes) Number of rules Test set size Correct classification rate (%) Mean absolute error Mean squared error

27 53 345 68.98 0.3662 0.5089

11 21 345 68.11 0.3951 0.4864

8 15 345 71.01 0.3749 0.4545

5 9 345 70.14 0.3699 0.4659

model to 94.88% and 94.56% for the NDT model for the pruned and unpruned trees, respectively). In addition, the NDT model consistently reduced the complexity of the decision tree, as shown by the reduced number of decision rules and the reduced tree size in all three cases. However, by comparing pruned versus unpruned results generated from the NDT model, it cannot be concluded that pruning for the NDT model significantly improves the performance in terms of predictive accuracy. In summary, the application of the NDT model on the UCI data sets show that the NDT model demonstrates a distinct improvement over the C4.5 algorithm in terms of reducing the complexity of the generated decision tree and increasing prediction accuracy of the model. Hence, the utilization of link weights for transforming data is a fruitful approach because it enables the NDT model to perform better than the C4.5 algorithm. Since the structure of the decision tree generated by the NDT model is simpler, this suggests that the NDT model is likely to produce rules with greater predictive accuracy and less redundant information.

4. Application of the NDT model for prediction of petroleum production 4.1. The dataset The data set on oil production consists of 320 data records and was obtained from Saskatchewan Industry and Resources. Each data record consists of the production rate and core analysis and pressure data from the Weyburn Field, which is one of the largest reservoirs located north of the Williston Basin in southeastern Saskatchewan, Canada. Petroleum production of this reservoir is from the Mississippian age carbonates, at a depth of more than 1300 m. It was discovered in 1954 and produced oil until February 1963, when water flooding was implemented. The maximum production appeared in 1965 at 46,000 barrels/day. The dataset consists of the three numeric variables of permeability, porosity and first-shut-in-pressure, which have been identified as the available and most significant geoscientific factors that influence oil production. Accumulate production is the predicted or output variable. Production values were discretized using an entropy-based discretization method into the following six ranges: Lowest: [2000—97,767): which includes 244 instances from the dataset.

Low: [97,767—199,633), which includes 52 instances from the dataset. Low Medium: [199,633—269,082), which includes 13 instances from the dataset. High Medium: [269,082—373,839), which includes 5 instances from the dataset. High: [373,839—469,633), which includes 3 instances from the dataset. Highest: [469,633–620,000], which includes 3 instances from the dataset. An analysis of the data distribution reveals that the dataset consists of an unbalanced distribution of data. Since the majority of the instances fall into the lowest and low ranges, it was expected that most rules would cover the lowest and low production ranges. 4.2. Construction of the NDT model First, a neural network was trained with the following parameters: (1) (2) (3) (4) (5) (6)

Number of Hidden Layers: 1 Input Nodes: 3 Hidden Nodes: 3 Learning Rate: 0.3 Momentum: 0.2 Activation function: Sigmoid Function

A 10-fold cross-validation was used. The training process was monitored by observing the training and testing set error curves, and training was terminated as soon as the test set error stopped decreasing. The results from the training process generated the link weights between the input and hidden layers, which specify the strength of the influence that the input attributes contribute to the nodes in the first hidden layer. The linked weights are indicated in Fig. 1. Then, the original numeric data set was transformed using Eqs. (1)–(3) into a new dataset. The new input variables have values equivalent to the inputs to the hidden layer of the trained neural network in Fig. 1. H 1 ¼ −0:37 þ 4:24  Permeability þ 4:2  Porosity þ ð−2:63Þ Pressure

ð1Þ

14

X. Li et al. / Journal of Petroleum Science and Engineering 104 (2013) 11–16

Fig. 1. Link weights between input layer and first hidden layer.

H 2 ¼ −0:76 þ 1:36  Permeability þ 1:24  Porosity þ ð−2:07Þ Pressure

ð2Þ

H 3 ¼ −0:82 þ 1:6  Permeability þ 1:31  Porosity þ ð−2:12Þ Pressure

ð3Þ Fig. 2. Sample rule generated by NDT model.

Next, the NDT model was applied to the new dataset to generate decision rules; a sample rule generated is shown in Fig. 2. The variables in the rule's IF clause shown in Fig. 2 are the transformed variables, whereas the originals variables of the IF clause in a rule generated using the C4.5 model are shown in Fig. 3. The final pruned NDT tree consists of 53 nodes and 17 leaves. Among the rules, 18 of them predict the Lowest class, 7 predict the Low class, and 2 predict the Low Medium class. As expected, most rules predict the Lowest class. None of the rules predicting the classes from High Medium to Highest survived the pruning process, possibly due to the lack of data in these classes. The rules predicting the classes of Low and Low Medium have low creditability because they are supported by only one to three examples from the data set. The accuracy rate of the pruned NDT model was 71.25%, which is rather low. Hence, expertise or knowledge on petroleum engineering is needed to analyze and validate the rules and results from the model. 4.3. Comparison with C4.5 model A C4.5 model was also built based on the original dataset and the results of the model were compared to those produced by the NDT model. A comparison of the results from applications of the two techniques is shown in Table 3, where the errors are averaged from 10-fold cross-validations. The tree size and number of rules are taken from the ones with the best-observed classification rates. As seen in Table 3, in comparing the results generated by the C4.5 algorithm with those from the NDT model, the NDT model reduces the tree size and number of rules by half. As well, the classification

Fig. 3. Sample rule generated by C4.5 Decision Tree model.

Table 3 Comparison between the NDT and the C4.5 models for production prediction. Measures

C4.5 without pruning

C4.5 with pruning

NDT without pruning

NDT with pruning

Tree size (# of nodes) Number of rules (# of leaves) Correct classification rate (%) Mean absolute error Mean squared error

131

117

67

53

52

45

24

17

65.93

69.68

68.13

71.25

0.3406

0.3031

0.3186

0.2875

7.8593

6.0156

6.5938

5.4375

accuracy increases from 69.68% for the C4.5 algorithm to 71.25% for the NDT model, and from 65.93% for the C4.5 model to 68.13% for the NDT model in the with pruning and without pruning cases respectively. Therefore, the NDT model demonstrates a higher correct classification rate, and improves upon the C4.5 model in terms of

X. Li et al. / Journal of Petroleum Science and Engineering 104 (2013) 11–16

15

Table 4 Comparison of the NDT, C4.5, and ANN models in terms of prediction accuracies. Measures

C4.5 without pruning

C4.5 with pruning

ANN

NDT without pruning

NDT with pruning

Correct classification rate (%) Mean absolute error Mean squared error

65.93 0.3406 7.8593

69.68 0.3031 6.0156

76.25 0.1282 0.0652

68.13 0.3186 6.5938

71.25 0.2875 5.4375

both the pruned and unpruned cases. A decrease in the number of rules is an improvement because a smaller tree can be more easily validated by petroleum engineers. Hence, the advantages of the NDT model include the following: (1) it takes into account interpendencies among the conditional parameters, and (2) since it generates a comparatively smaller rule set with an acceptable level of classification accuracy, it provides better explanation capability. In other words, with a small sacrifice in terms of classification accuracy, the NDT model is able to provide some explicit heuristics for classification that support predicting oil production from a new well. Therefore, we conclude that compared to the decision tree generated by the C4.5 algorithm, the NDT model is able to provide comparable performance with more explicit explanation for predicting oil production from a new well. However, since the classification accuracy is not entirely satisfactory, the ANN approach was adopted for improving this aspect of the prediction performance.

5. Application of the ANN model for predicting petroleum production The discussion has so far suggested the NDT model gives better performance than the C4.5 algorithm. In this section, application of the ANN approach for predicting petroleum production, and a comparison of the ANN approach with the NDT and the C4.5 models in terms of predictive accuracies will be presented.

5.1. Background of artificial neural networks The ANN approach has been widely adopted for problem solving in the petroleum industry. Aminzadeh et al. (1999) applied the ANN technique for estimating oil reservoir parameters from remote seismic data. Huang and William (1977) developed an ANN model for predicting porosity and permeability from well logs. Wong and Taggart (1995) described an ANN model similar to that in Huang and William (1977), but they also included information on lithofacies as input. Gharbi et al. (1999) presented a universal neural-network-based model as an alternative method for predicting pressure–volume–temperature (PVT) properties. All of these ANN applications demonstrate high predictive accuracies. This study adopted a back-propagation neural network, which is a simple but relatively efficient procedure for training neural networks in two phases (Chauvin and Rumelhart, 1995). During the first feed-forward phase, the output is calculated from an input pattern. In the second phase, the neural network initially computes changes to the weights in the final layer, then it reuses much of the same computation to compute changes to the weights in the previous layer, and ultimately goes back to the initial layer. The network makes small changes towards a solution, which is being improved with each step made, until no further improvements are possible. At each step, the procedure moves in the direction of the most rapid performance improvement by varying all the weights simultaneously in proportion to how much good is done by changes in individual weights.

5.2. Comparison of the C4.5, ANN and NDT models The ANN modeling task has the objective of minimizing the classification error and Weka's Neural Network Classifier (Witten and Frank, 2005) was used for implementation. The results did not vary significantly when the number of hidden units was between three and five. The comparison of the accuracies of the three techniques of C4.5, ANN, and NDT is shown in Table 4, in which the errors are averaged from 10-fold cross-validations. It can be seen from Table 4 that in terms of prediction accuracy or correct classification rate, the ANN model has the highest rate of 76.25% while the C4.5 and NDT model generate correct classification rates of lower than 76%. Hence, the ANN model outperformed both the NDT and the C4.5 models in terms of prediction accurancy. However, based on our observation of the result generated by the ANN model, it classified every instance in the test sets into the range of Lowest, in which most of the data samples reside. Since the ANN model is a black box, this cannot be explained and it is not clear whether the ANN model can predict outside of the Lowest range. By contrast, the NDT model generated seven rules for predicting the Low rang and two rules for predicting the Low Medium range. Hence, although the ANN approach shows a higher classification rate, its coverage of the data ranges is limited to only the Lowest one. Therefore, the data coverage of the ANN model is lower than that of the NDT approach.

6. Conclusion and future work This paper presents an application of the Neural-Based Decision Tree (NDT) learning model for prediction of petroleum production. The experimental results generated from application of the three techniques on the petroleum data set and the three other numerical data sets obtained from the UCI Repository show that the NDT approach has the following advantages: (1) it can capture interdependencies among the input variables, (2) it provides a higher classification accuracy and (3) it generates decision trees of lesser complexity compared to the C4.5 method. However, the classification accuracies of both the NDT and C4.5 methods are lower than that of the ANN approach. On the other hand, the NDT approach appears to be able to capture the classes with low number of instances more effectively than the ANN approach. The relatively lower classification rate of the NDT compared to the ANN approach suggests that there is still much room for improvement. This low predictive accuracy may be due to the fact that the data or input variables do not adequately cover the problem space. Future research will examine in greater detail the strengths and weaknesses of each approach, as well as other factors that influence oil production, such as rock formation.

Acknowledgments The generous support of Natural Science and Engineering Research Council of Canada (NSERC) and the Canada Research Chair Program are gratefully acknowledged.

16

X. Li et al. / Journal of Petroleum Science and Engineering 104 (2013) 11–16

References Aminzadeh, F., Barhen, J., Toomarian, N.B., 1999. Estimation of reservoir parameter using a hybrid neural network. J. Pet. Sci. Eng. 24 (1), 49–56. Blake, C.L. and Merz, C.J., 1987. UCI Repository of Machine Learning Databases. 〈http://www.ics.uci.edu/ mlearn/MLRepository.html〉. University of California, Department of Information and Computer Science, Irvine, CA. Chauvin, Y., Rumelhart, D.E., 1995. Backpropagation: Theory, Architectures, and Applications. Lawrence Erlbaum Associates (Chapter 1). El-Banbi A.H., and Wattenbarger R.A. (1996. Analysis of commingled tight gas reservoirs. In: SPE Annual Technical Conference and Exhibition, Denver, Colorado, USA, 6–9 October 1996 (SPE 36736). Gharbi, R.B., Elsharkawy, A.M., Karkoub, M., 1999. Universal neural-network-based model for estimating the PVT properties of crude oil systems. Energy Fuels 13, 454–458. Han, J., Kamber, M., 2006. Data Mining: Concepts and Techniques, 2nd ed. Morgan Kaufmann Publishers, San Francisco, USA. Huang, Z., William, M.A., 1977. Determination of porosity and permeability in reservoir intervals by artificial neural network modeling, offshore Eastern Canada. Pet. Geosci. 3 (3), 245–258. Jensen T.B., 1998. Estimation of production forecast uncertainty for a mature production license. In: SPE Annual Technical Conference and Exhibitions, New Orleans, USA, SPE 49091, September. Lee Y.S. and Yen S.J., 2002. Neural-based approaches for improving the accuracy of decision trees. In: Proceedings of the Data Warehousing and Knowledge Discovery Conference, pp. 114–123.

Lee, Y.S., Yen, S.J., Wu, Y.C.., 2006. Using neural network model to discover attribute dependency for improving the performance of classification. J. Inf. Electron. (JIE) 1 (1), 9–19. Lee, Y.S., Yen, S.J. and Lu, C.H.., 2009. Using neural network model to discover attribute dependency for improving the performance of decision tree classification algorithm. In: Proceedings of 17th National Conference on Fuzzy Theory and Its Applications, pp. 1258–1263. Li K. and Horne R.N., (2003. A decline curve analysis model based on fluid flow mechanisms. In: SPE Western Regional/AAPG Pacific Section Joint Meeting held in Long Beach, CA, USA, May 19–24, 2003 (SPE 83470). Nguyen, H.H., Chan, C.W., Wilson, M., 2004. Prediction of oil well production: a multiple neural-network approach. Intelligent Data Anal. J., 183–196. Perez, H., Datta-Gupta, A., Misra, S., 2005. The role of electrofacies, lithofacies and hydraulic flow units in permeability predictions from well logs: a comparative analysis using classification trees. SPE Reservoir Eng. Eval. 8 (2), 143–155. Quinlan, J.R., 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA. Weiss W.W., Balch R.S., and Stubbs B.A., 2002. How artificial intelligence methods can forecast oil production. In: SPE/DOE Improved Oil Recovery Symposium, 13–17 April 2002, Tulsa, Oklahoma (75143-MS). Witten, I.H., Frank, E., 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. Morgan Kaufmann, San Francisco. Wong, P.M., Taggart, I.J., 1995. Use of neural network methods to predict porosity and permeability of a petroleum reservoir. AI Appl. 9 (2), 27–37. Yen, S.J., Lee, Y.S., 2011. A neural network approach to discover attribute dependency for improving the performance of classification. Expert Syst. Appl. 38 (10), 12328–12338.