Journal of Petroleum Science and Engineering 176 (2019) 321–341
Contents lists available at ScienceDirect
Journal of Petroleum Science and Engineering journal homepage: www.elsevier.com/locate/petrol
Optimization of models for a rapid identification of lithology while drilling A win-win strategy based on machine learning
T
Jian Suna,∗, Qi Lia, Mingqiang Chenb, Long Renb, Guihua Huangb, Chenyang Lib, Zixuan Zhangb a b
China University of Petroleum, Beijing, 102249, China Xi'an Shiyou University, Xi'an, 710065, China
ARTICLE INFO
ABSTRACT
Keywords: Machine learning Lithologic identification while drilling Optimization Correlation analysis
The identification of lithology from well log data is an important task in petroleum exploration and development. However, due to the complexity of the sedimentary environment and reservoir heterogeneity, the traditional lithology identification methods can not meet the needs of real-time and accurate prediction and identification with logging while drilling (LWD) equipment. The basic data of this paper are derived from conventional wireline logging (CWL) data and the LWD data in Yan'an Gas Field. The main research goal is to compare and analyse three popular machine learning algorithms, which are one-versus-rest support vector machines (OVR SVMs), one-versus-one support vector machines (OVO SVMs) and random forest (RF), and to optimize a more practical method in the field for LWD systems. To reduce the dimensions of the input data, the characteristic parameters of the training data are obtained by a correlation analysis of the logging data. The optimal parameter values of each algorithm are determined by grid search method and 10-fold cross-validation method. On this basis, the lithology predictions of the actual LWD data are carried out by using three classifiers. Considering the time consumption of the model training and the lithology identification accuracy of the model, the best lithology identification model while drilling is selected. The results show that the characteristic parameters of the training data after the correlation analysis are AC, CAL, GR, K, RD and SP logs. The overall classification and recognition performance of the RF classifier is better than that of the other two classifiers, and its accuracy is even greater than 90%. The evaluation matrix shows that the OVR SVMs and RF classifiers yield lower prediction errors than the OVO SVMs classifier in each single lithology identification, but the RF classifier spends much less time in the training process. Based on the comprehensive comparative analysis, it is considered that the RF classifier has the characteristics of a short training time and high recognition accuracy in practical production applications, so it is an ideal optimization classifier for lithology identification while drilling. The research results provide not only a theoretical basis for the drilling geosteering of oilfield development but also valuable information for future basic research.
1. Introduction The development of oil and gas in unconventional reservoirs, such as low-permeability reservoirs, tight reservoirs, and shale reservoirs, has become popular in global oil and gas development. The development of oil and gas resources in these reservoirs is different from that in conventional reservoirs and often requires more time and higher economic costs. Improvements in technology and methodologies in any one area can have undeniable benefits (Sun et al., 2017). In recent years, LWD technology has been widely adopted in the drilling of horizontal wells in unconventional reservoirs. LWD data are generally used to characterize effective reservoirs and to guide drilling and geosteering work. These data are applied less often to the identification ∗
of lithology during drilling. In the case of LWD data, detailed lithology identification is possible, but it is difficult to achieve real-time lithological identification based on the conventional wireline logging (CWL) interpretation method, which is unable to identify detailed reservoir lithologies while drilling. Therefore, by combining machine learning algorithms and lithogeny classification models, LWD data can be used to analyse reservoir lithology in real time. Traditional lithology identification methods include cutting logging, drilling core and logging data interpretation models. However, cuttings depend on the quality of the logging, and it is difficult to obtain a complete description of the logging profile in the oil field by drilling cores (SHE G et al., 2015; Salehi, S.M. and Honarvar, B., 2014; BAO Q et al., 2013). With the development of logging technology, abundant logging information has
Corresponding author. E-mail address:
[email protected] (J. Sun).
https://doi.org/10.1016/j.petrol.2019.01.006 Received 1 August 2018; Received in revised form 21 December 2018; Accepted 2 January 2019 Available online 22 January 2019 0920-4105/ © 2019 Elsevier B.V. All rights reserved.
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Fig. 1. Support vector machine schematic diagram.
sufficient, and specific issues should be analysed. Therefore, in this paper, an OVR SVMs classifier, an OVO SVMs classifier and an RF classifier are constructed according to various characteristic data obtained from well logging. Various target values of the lithology are classified, and the results obtained by each classifier are analysed. The optimal classifier and corresponding parameters are selected to accurately identify the formation lithology while drilling.
gradually become available to researchers. The comprehensive use of various logging data can quickly and accurately yield formation lithology information (ZHANG D Q et al., 2015). Currently, based on statistics and computer science, machine learning methods have been applied to the identification of reservoir lithology (Zhong Yihua and Li Rong, 2009; Li Rong and Zhong, 2009; Song et al., 2007; Liu H. et al., 2009; Xiongyan Li and Li, 2013; Yunxin Xie et al., 2018; Shaoqun Dong et al., 2016; Arsalan A. Othman and Richard Gloaguen, 2017). Due to complex geological conditions and sedimentary environments, a nonlinear relationship between reservoir heterogeneity and the reservoir logging response characteristics indicates that the use of linear logging response equations and empirical statistical formulas does not effectively characterize the reservoir and cannot meet actual production needs. The traditional lithology identification method is directly related to the experience of researchers and is associated with a certain degree of uncertainty. Therefore, the use of nonlinear information processing technology to determine the distribution characteristics of lithology can better meet the needs of oil and gas exploration and development when conventional linear and empirical logging interpretation technology is insufficient. Artificial neural network (ANN) and support vector machine (SVM) methods have been used in lithology identification. Although they play interpretive roles, there are still many associated problems. Notably, the ANN method provides poor results due to local optimality, dimension disaster, and small data sample issues (Ahmed Amara Konaté et al., 2015; Morteza Raeesi et al., 2012; Baouche Rafik et al., 2017; B. Shokooh Saljooghi and A. Hezarkhani, 2015); the SVM can overcomes these shortcomings, but traditional SVM algorithm can only be applied for two-category classification. In practical applications of data mining, it is often necessary to solve multicategory classification problems. Therefore, OVR SVMs, OVO SVMs (Hsu, C.-W. and Lin, C.-J., 2002), and RF algorithms have emerged. These three methods can effectively avoid the problems of the above methods, especially the RF algorithm, which is composed of multiple decision trees. Compared with a single decision tree algorithm, this algorithm has a higher training accuracy and better classification effect and is less likely to overfit the data. However, for most statistical modelers, the RF algorithm is similar to a black box, because the internal operations of the model cannot be controlled, and only the different parameters and random seeds can be modified. For small data or low-dimension data (uncharacteristic data), the model may not produce a good classification consequence. A single classifier cannot be considered to be
2. Methodology 2.1. The SVM principle The SVM is developed from the optimal classification surface in the case of linear separability. The core concept is that the optimal classification surface can correctly separate the two types of samples and maximize their classification intervals. In reality, most of the problems encountered are nonlinear, and the nonlinear problem must be converted into a linear problem in a high-dimensional space through a nonlinear transformation. Then, an optimal classification surface is obtained in the transformed space (Neda Mahvash Mohammadi and Ardeshir Hezarkhani, 2018; Jaime et al., 2018; Xiaoling Lu et al., 2018; Xiekai Zhang et al., 2017; Italo Zoppis et al., 2018). Suppose that in the nonlinear case, the sample points are (xi, yi) (i = 1, …,n). In the highdimensional space, the classification surface equation is given in formula (1), where φ(x) is a mapping function from the input space to the
Fig. 2. Decision tree schematic diagram.
322
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Table 1 Granularity classification table.
n ya i=1 i i
0
Granularity classification
D (mm)
φ
Conglomerate Giant sandstone Coarse sandstone Medium sandstone Fine sandstone Siltstone Mudstone
≥2 1~2 0.5~1 0.25~0.5 0.1~0.25 0.01~0.1 <0.01
≤-1 −1~0 0~1 1~2 2~3 3~5 >5
yi [( i
Tx
i)
+ b]
1
i (i
1 ( 2
The SVM algorithm was originally designed for binary classification problems. When multicategory classification problems are encountered, corresponding multicategory classifiers must be constructed. Currently, there are two main methods for constructing SVM multiclass classifiers. One is to directly modify the objective function, and the parameter solutions of multiple classification planes are combined into one optimization problem. Then, the optimization problem is solved “once”. This method of multicategory classification is called the direct method, and although it seems simple, it is computationally complex and difficult to implement; therefore, it is generally only applied to small problems. The other method involves the combination of multiple binary classifiers to construct multiple classifiers, usually OVO SVMs or OVR SVMs, and this method is indirect. The OVR SVMs classifier typically classifies samples of a certain category into one class in training and the remaining samples into another class. The samples of k categories can be used to construct k SVMs, which involve the construction of k two-class classifiers. The ith classifier divides the ith congruent categories, and the ith classifier takes the ith class of the training set as a positive class and the rest of the class points as a negative class for training. In the discrimination process, the input signal is obtained through k classifiers to obtain k output values fi(x) = sgn (gi(x)). If the output value is +1, the corresponding classification is the same as the input signal. However, the decision function constructed under actual conditions always has errors. If the output value is more than one +1 or if none of the outputs are +1, the largest output corresponds to the input category. This method has a more obvious issue: the training accounts for a small proportion of all samples. Thus, the results will be affected by the
= 1, …, n)
n
W (a) =
ai j=1
)+c
i i=1
1 2
(3)
n
ai aj yi yj K (x i , xj ) i, j = 1
(6)
2.2. The SVM multiclassification method
n T
ai yi K (xi , x j) + b i=1
(2)
( , )=
(5)
N
(1)
0
= 0(i = 1, …, n) c
f (x ) = sgn
feature space, ω is a space vector, and b is a constant term. The schematic diagram is shown in Fig. 1. Under the constraint given in formula (2), the minimum value of the function (formula (3)) is determined. This problem can be converted into a dual problem through the Lagrangian optimization method, which involves a quadratic function extremum problem. Thus, the maximum value of formula (4) is obtained under the constraint given in formula (5), where a is a Lagrangian multiplier and c is a constant that controls the degree of punishment for an incorrect sample. The optimal classification discriminant function obtained after solving the above problem is shown in formula (6), where N is the number of support vectors and K (x i , xj ) is the kernel function. ωφ(x)+b = 0
a
(4)
Fig. 3. Part of the core sandstone classification.
323
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Table 2 Lithologic characteristic table. Lithology type
lithologic characteristic
coarse sandstone
in mineral composition, quartz accounts for about 80%, feldspar about 15%, and other about 5%, with sub-round or sub-angle-shaped grains, well sorted, argillaceous cementation and relatively loose in mineral composition, quartz accounts for about 85%, feldspar about 10%, and other about 5%, with sub-round-shaped grains, with medium sorted, argillaceous cementation, relatively dense in mineral composition, quartz accounts for about 80%, cuttings and other about 20%, with sub-round or sub-angle-shaped grains, with medium sorted, argillaceous cementation the silty sand is unevenly distributed, partially enriched, partially strip-shaped, hard and brittle, non-absorbent, and poor in plasticity. in mineral composition, calcium carbonate accounts for about 10%, magnesium calcium carbonate about 80%, and other about 10% in mineral composition, calcium carbonate accounts for about 90%, magnesium calcium carbonate about 5%, mud and other about 5% high mud content, low sand content
medium sandstone fine sandstone siltstone dolomite limestone mudstone
remaining samples, and notable deviations can occur. The OVO SVMs classifier involves an SVM for any two types of samples, and k samples are needed to design k*(k-1)/2 SVMs. When classifying an unknown sample, each classifier will assess its class and vote for the corresponding class. The class that gets the most votes is the class of the unknown sample. Voting is completed as follows. Let A = B = C = D = 0. For the (A, B) classifier, if A wins, then A = A + 1; otherwise, B = B + 1. For the (A, C) classifier, if A wins, then A = A+1; otherwise, C]C+1. For the (C, D) classifier, if C wins, then C]C+1; otherwise, D = D+1. The final decision is the maximum of (A, B, C, D). Although this method is better than using OVR SVMs, when the number of categories is large, the number of models is n*(n-1)/2, which will greatly increase the calculation time.
2.3. The RF principle The RF algorithm is the most popular machine learning model. In the 1980s, Breiman et al. first proposed the classification tree algorithm (Breiman, L., 1996). The RF algorithm has been extensively used in engineering research (Zhao, L. et al., 2014; Timm, B.C. and McGarigal, K., 2012; Grinand, C. et al., 2013; Attarchi, S. and Gloaguen, R., 2014; Li, C. et al., 2014), but has been rarely used for lithology classification (Cracknell, M.J. and Reading, A.M., 2014). This method repeatedly divides the data into two categories for classification or regression, which greatly reduces the number of computations. In 2001, Breiman combined the classification trees into an RF (Breiman, L., 2001), which randomizes the use of variables (columns) and the use of data (rows),
Fig. 4. a Coarse sandstone core. b Medium sandstone core. c Fine sandstone core. d Siltstone core. e Dolomite core. f Limestone core. g Mudstone core.
324
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Table 3 Main characteristic value range for each lithology in LWD logging response.
coarse sandstone medium sandstone fine sandstone siltstone dolomite limestone mudstone
AC
CAL
GR
K
RD
SP
0.1668–0.2701 0.1262–0.4817 0.1076–0.3213 0.1512–0.3072 0.0036–0.1969 0.0133–0.0549 0.0313–0.4530
0.0428–0.1761 0.0074–0.4979 0.0172–0.1635 0.0288–0.0837 0.0071–0.1233 0.0725–0.1388 0.0092–0.3608
0.0302–0.2673 0.0423–0.3522 0.0186–0.2527 0.0505–0.2844 0.0050–0.1449 0.1184–0.2241 0.1035–0.3344
0.0902–0.2708 0.0358–0.6285 0.0566–0.5013 0.2964–0.7318 0.0260–0.2480 0.0836–0.6253 0.1057–0.9944
0.0155–0.0661 0.0065–0.1172 0.0039–0.0532 0.0077–0.0622 0.0014–0.1972 0.0371–0.2817 0.0027–0.0714
0.3399–0.5427 0.1274–0.5230 0.0638–0.4957 0.3098–0.5311 0.2819–0.7070 0.2764–0.3171 0.2955–0.5536
generates many classification trees, and statistically summarizes the classification tree results. The results are robust to missing data and non-equilibrium data and can appropriately predict the effects of up to thousands of explanatory variables. Therefore, the RF algorithm is considered one of the best algorithms available today (E. Vigneau et al., 2018; Michele Fratello and Roberto Tagliaferri, 2018; Robin Genuer et al., 2017; Christoph Behrens et al., 2018; Behnam Partopour et al., 2018). As the name implies, the RF approach involves the creation of a forest in a random manner. Forests are composed of many decision trees, and all decision trees in an RF are unrelated. All decision trees independently make decision judgments, and finally vote on the results obtained by all decision trees to classify and discriminate the targets (Yunxin Xie et al., 2018; Arsalan A. Othman and Richard Gloaguen, 2017). After the forest is acquired, when a new sample is input, each decision tree in the forest is separately assessed to determine to which
category the sample belongs. Finally, the number of times each category is selected is counted. The category that is selected the most is the category predicted for the sample. A decision tree is a tree structure (can be a binary tree or a non-binary tree). Each non-leaf node represents a test of a characteristic attribute, and each leaf node stores a category. The decision process of the decision tree is shown in Fig. 2. Starting from the root node, the corresponding feature attributes of the classification terms are tested, and the output branches are selected according to the test results until reaching the leaf nodes. The decision result is the category that is stored in the leaf nodes. In establishing each decision tree, two things must be considered: sampling and complete division. The first step is associated with two random samplings. The RF samples the input data in rows and columns. For row sampling, there is a method of putting back, that is, in the sample set obtained by sampling, there may be duplicate samples.
Fig. 5. a Correlation between KTH and GR. b Correlation between TH and GR. c Correlation between TH and KTH. d Correlation between RS and RD.
325
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Assuming that the input sample is N, the analysed sample is also N; therefore, in training, the input sample of each tree is not a complete sample, and it is relatively easy to avoid overfitting based on this approach. For column sampling, m features are selected from M features (m < M), where M is the total number of features. The second step is to establish a decision tree for the sampled data in a completely split method so that one leaf node of the decision tree cannot continue to split and to avoid all samples being assigned the same classification. In general, the decision tree algorithm has an important step: pruning. Because the first two random sampling processes guarantee randomness, no overfitting occurs, even without pruning.
Table 4 Partial training data.
2.4. The RF classification method Each tree in the RF is a binary tree, and forest generation follows the top-down recursive splitting principle. That is, the training set is sequentially divided from the root node. In the binary tree, the root node contains all the training data. According to the principle of minimum node purity, the root is split into the left node and the right node. Each node contains a subset of the training data. According to the same principle, the node continues to split until it satisfies the branch stop rule and stops growing. Each decision tree actually learns the classification of specific data, and random sampling ensures that the repeated samples are classified by different decision trees so that the classification ability of different decision trees can be evaluated. The specific steps in RF classification are as follows.
AC
CAL
GR
K
RD
SP
type
0.0291 0.0328 0.0343 0.2438 0.0911 0.0237 0.2336 0.2130 0.2278 0.3995 0.2368 0.2380 0.1414 0.1855 0.0926 0.2394 0.2360 0.2416 0.2594 0.2544
0.0317 0.0334 0.0370 0.1433 0.0673 0.0324 0.1396 0.1272 0.1388 0.2026 0.1426 0.2146 0.0537 0.1081 0.0660 0.1202 0.1348 0.2532 0.2163 0.3159
0.0267 0.0306 0.0323 0.0865 0.0026 0.0246 0.0844 0.2483 0.2704 0.2400 0.2765 0.2958 0.1308 0.0703 0.0022 0.0951 0.2692 0.0523 0.2957 0.0660
0.2023 0.2031 0.2001 0.2901 0.0599 0.1936 0.2885 0.5921 0.6653 0.6366 0.6653 0.8490 0.3837 0.3120 0.0561 0.3301 0.6388 0.1761 0.8727 0.2413
0.0789 0.0761 0.0745 0.0176 0.2091 0.0910 0.0183 0.0225 0.0202 0.0140 0.0190 0.0173 0.0464 0.0280 0.2020 0.0323 0.0185 0.1264 0.0184 0.0218
0.3337 0.3336 0.3336 0.0940 0.2635 0.3338 0.0970 0.1383 0.1518 0.2398 0.1666 0.2105 0.1178 0.0821 0.2450 0.1009 0.1758 0.3570 0.2002 0.0673
dolomite dolomite dolomite fine sandstone limestone mudstone fine sandstone coarse sandstone coarse sandstone medium sandstone coarse sandstone medium sandstone siltstone, dolomite mudstone limestone siltstone coarse sandstone medium sandstone medium sandstone fine sandstone
2013). The target formations are the Shanxi formation, Majiagou formation and Shihezi formation. The complex lithology makes lithology identification difficult. The training data sets and core analysis reports used in this study were obtained from 3345 groups of well logging data from thirty-one production wells, and the test data were obtained from 366 groups of LWD data from three production wells. The sandstone reservoir of the Upper Paleozoic strata in the Ordos Basin is favourable for gas accumulation (Li, J. et al., 2005; Yang, H. et al., 2012). Therefore, it is important to study and further understand this reservoir. The core analyses indicate that the major lithology types include coarse sandstone, medium sandstone, fine sandstone, siltstone, dolomite, limestone and mudstone. Therefore, the above lithology types are the seven target lithology classes to be identified. The classification of sandstone categories is based on Granularity classification table (Table 1). Part of the core sandstone categories are shown in Fig. 3.
(1) From the original training data set, apply the bootstrap method to randomly select k new sample sets with putting back and construct k classification trees. Each sample that is not extracted constitutes k out-of-pocket values. (2) Assuming there are n features, randomly extract m features per node per tree. Determine the amount of information contained in each feature and select the feature with the highest classification ability for a node split. (3) Do not trim each tree, and allow them to grow to the maximum size. (4) Let many generated classification trees compose the RF, and use the RF classifier to assess and classify new data. The classification results are determined by the number of votes cast by the tree classifiers. This method has many advantages over other machine learning classification methods: it is highly accurate; it can be applied to highdimension data; the introduction of randomness makes the method less susceptible to overfitting; the trained model has a small variance and strong generalization ability; and training can be highly parallelized, with high sample training speeds in the era of big data. 3. Data preparation and processing 3.1. Data preparation The wells in the data set are located in wellblock Yan 969 of the Yan'an gas field, China, which is located in the Ordos Basin. The permeability of the source area is less than 10 × 10−3μm2, and the maximum value is 7 × 10−3μm2. Reservoir classification was performed according to the permeability criteria for ultralow permeability reservoirs. Two tectono-stratigraphic sequences are present in the Paleozoic strata of the Ordos Basin: an Upper Paleozoic sequence containing terrestrial clastics and coals and a Lower Paleozoic sequence of marine to non-marine sediments (Dai, J. et al., 2005). The Upper Paleozoic reservoirs, which have relatively low porosity and permeability, formed in a fluvial-deltaic depositional environment (Fu, J. et al.,
Fig. 6. Workflow chart of lithology identification.
326
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Fig. 7. a The 10 fold cross-validation of Parameter C in OVR SVMs, γ = 5. parameter in OVR SVMs, C = 2200, b—The 10 fold cross-validation of parameter C in OVR SVMs, γ = 5. parameter in OVR SVMs, C = 2200. c—The 10 fold cross-validation of parameter C in OVR SVMs, γ = 5. parameter in OVR SVMs, C = 2350. dThe 10 fold cross-validation of parameter C in OVR SVMs, γ = 5. parameter in OVR SVMs, C = 2350.
The lithological characteristics of the area are shown in Table 2. It is difficult to accurately identify the lithology of coarse sandstone, medium sandstone, fine sandstone and siltstone using traditional logging interpretation methods. Based on the current technical conditions, conventional well logging data can typically be obtained through LWD. LWD has the following characteristics. (1) LWD provides real-time data; therefore, all types of physical rock information can be collected in real time while drilling the formation. When mud invasion is relatively shallow, the logging response is little affected and can still accurately reflect the characteristics of the original formation. (2) For some borehole conditions that make CWL difficult, such as in large deviated wells or horizontal wells, LWD has unique advantages. (3) The application of geosteering technology improves the drilling efficiency, reduces drilling risks and improves the associated economic benefits. (4) Compared with CWL, the intruded mud can be analysed to provide a basis for the identification of oil and gas layers. There are many types of LWD information that vary based on the logging principle, electric logging data, sonic logging data, nuclear logging data, etc. To provide high flexibility, high calculation accuracy and comprehensive analysis possibilities, many types and
large quantities of logging data should be obtained. A comprehensive set of logs is typically acquired by lithology interpretation. These logs include acoustic (AC), borehole diameter (CAL), gamma ray (GR), potassium (K), gamma ray without uranium (KTH), deep investigation double lateral resistivity (RD), shallow investigation double lateral resistivity (RS), spontaneous potential (SP), and thorium (TH) logs. The lithology labels are obtained from the corresponding cores and include coarse sandstone (Fig. 4a), medium sandstone (Fig. 4b), fine sandstone (Fig. 4c), siltstone (Fig. 4d), dolomite (Fig. 4e), limestone (Fig. 4f) and mudstone (Fig. 4g). 3.2. Logging data standardization Data standardization (normalization) is a basic task in machine learning data classification. Different evaluation indicators often have different dimensions and units that can affect the results of the data analysis. Due to the different dimensions of each type of logging data and the large differences in numerical magnitudes, it is necessary to standardize the original logging data and eliminate the effect on the analysis results. After the original data have been standardized, each
327
Journal of Petroleum Science and Engineering 176 (2019) 321–341
0.8874
0.8865 0.8872
indicator is of the same order of magnitude, which is suitable for comprehensive and comparative evaluations. There are two commonly used data normalization methods: min-max normalization and Z-score normalization. In this paper, as shown in formula (7), min-max normalization is used to standardize the logging data.
f (x i ) =
3345
Using more data types and feature parameters does not guarantee a higher machine learning accuracy. The well logging data usually contain a considerable amount of noise, which will have an impact on the machine learning identification results. Because logging data obtained via the same logging principle are typically better correlated than data obtained via different logging principles, if the data volume is too large or the correlation among characteristic parameters is high, a parameter redundancy phenomenon can occur. This phenomenon increases the machine learning time and can even affect the accuracy of model learning. Therefore, correlation analysis must be performed for the collected logging data. The linear correlation analysis is the Goodness of Fit for the two types of data. Goodness of Fit refers to the degree to which the regression line fits the observations. The statistic for measuring the Goodness of Fit is the coefficient of determination R2.
4.6
2823 2006 749 1340 944 5
5
time consuming(s)
2350
3000 2200
ˆ
yi )
2
y¯ )2
ˆ
, where yi is the regression fitting value, y¯ is the
4. Lithology identification experiments The above section describes the data processing and classification identification methods. The workflow diagram for identifying the lithology while drilling used in this paper is shown in Fig. 6. 4.1. Model training The lithology classification model programs in this paper were all written in the Python language. The Python syntax is simple and clear, with rich and powerful libraries. This study mainly uses the scikit-learn Python framework. Scikit-learn is a Python module that integrates a
3
[0.5, 1, 5, 10, 15] 5 np.linspace (1, 10, 10) 5 np.linspace (4, 6, 11) np.linspace (1000, 9000, 5) np.linspace (1000, 5000, 21) 2200 np.linspace (2000, 2400, 17) 2350 1 2
N i = 1 (yi N i = 1 (yi
average value, and the maximum value of R2 is 1. When the value of R2 is closer to 1, it means that the regression line fits the observations as much as possible; conversely, the smaller the value of R2 is, the worse the fit of the regression line to the observations. The correlation analysis of 9 types of log data was performed. As shown in Fig. 5a- Fig. 5d, we obtain the linear correlation equation of KTH and GR as y = 1.0036x+0.0067 by linear correlation analysis, the slope is 1.0036, approximately 1, and the intercept is 0.0067, close to 0, R2 = 0.9128 (Fig. 5a). Similarly, the linear correlation equation of TH and GR is y = 0.9998x+0.0009, the slope is 0.9998, approximately 1, and the intercept is 0.0009, close to 0, R2 = 0.818 (Fig. 5b). The linear correlation equation of TH and KTH is y = 0.984x-0.0037, the slope is 0.984, approximately 1, and the intercept is −0.0037, close to 0, R2 = 0.8742 (Fig. 5c). The linear correlation equation of RS and RD is y = 1.0309x+0.0006, the slope is 1.0309, approximately 1, and the intercept is 0.0006, close to 0, R2 = 0.9581 (Fig. 5d). Therefore, the AC, CAL, GR, K, RD and SP logs are chosen as the characteristic parameters for model training. The lithologies are classified into seven categories: coarse sandstone, medium sandstone, fine sandstone, siltstone, dolomite, limestone and mudstone. Some of the training data are shown in Table 4.
25 21 10 17 11
optimal parameter C combination number gamma search range C search range Sequence
(7)
3.3. Correlation analysis
R2 = 1
Table 5 Results of the three times optimizations by OVR SVMs.
x i x min x max x min
Where xmax denotes the maximum value of the sample data and xmin denotes the minimum value of the sample data. After normalization, all logging data values fall within the interval of [0, 1]. The main characteristic value range for each lithology in the LWD logging response is shown in Table 3 (normalized data).
optimal parameter gamma
training set sample number
training set accuracy
J. Sun et al.
328
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Fig. 8. a The 10 fold cross-validation of parameter C in OVO SVMs, γ = 5. parameter in OVO SVMs, C = 3200. b—The 10 fold cross-validation of parameter C in OVO SVMs, γ = 5. parameter in OVO SVMs, C = 3200. c The 10 fold cross-validation of parameter C in OVO SVMs, γ = 5. parameter in OVO SVMs, C = 3175. d—The 10 fold cross-validation of parameter C in OVO SVMs, γ = 5. parameter in OVO SVMs, C = 3175.
wide range of machine learning algorithms for both supervised and unsupervised problems (Pedregosa, F. et al., 2011).
subsets. Each subset is used as a test set, and the remaining data are used one time as the training set. The k-fold cross-validation process is repeated k times, and one subset is selected as the test set each time. Then, the k-fold average cross-validation recognition rate is obtained as the result. In this method, all samples can be used as training sets and test sets, and each sample is verified once. Leave-one-out cross-validation (LOOCV), assumes there are n samples in the data set and that the method involves n cross-validation. Thus, each sample is used as a test set, and the remaining n-1 samples are used as the training set. Almost all the samples in each round are used to train the model in this method. Therefore, the results of the method closely approximate the distribution of the maternal sample, and the estimated generalization error is generally low. In the case of a small sample of experimental data, LOOCV can be considered. The calculation cost of LOOCV is high, and the number of models that must be built is equal to the total number of samples. When the total number of samples is sufficiently large, LOOCV may be difficult to implement unless the training speed of each model is very fast. However, parallel calculations can be used to reduce the time required for the
4.1.1. The OVR SVMs classifier for lithology identification Kernel functions typically includes polynomial kernel function, Gaussian kernel function, and linear kernel function. After a comparative analysis, the Gaussian kernel function K (x1, x2 ) = exp
(
x1
x2 2 2 2
)
was selected (Chang, Y.W. et al., 2010); this function is highly flexible. A grid search method was used to select the approximate ranges of the optimal training parameters C and γ of the OVR SVMs, where C is the 1 penalty coefficient and = 2 . The grid search method is an exhaustive search method that specifies parameter values. The optimal learning algorithm is obtained by optimizing the parameters of the estimation function through a cross-validation method. Cross-validation is a statistical analysis method used to verify the performance of a classifier and can avoid overfitting problems. There are two main types of crossvalidation: k-fold cross-validation and leave-one-out cross-validation. In the k-fold cross-validation method, the data set is divided into k
329
Journal of Petroleum Science and Engineering 176 (2019) 321–341
0.8985 5
586 565 248 443 293 5
5
time consuming(s)
3345
0.8925 0.8985
calculations. 10-fold cross-validation has been experimentally shown to give the best estimate of the misclassification error rate and is therefore recommended for use as a standard cross-validation technique (Witten, I.H. et al., 2011; Rodriguez, J.D. et al., 2010; Ahmed Amara Konaté et al., 2015). This paper uses the 10-fold cross-validation method to optimize the objective function and find the best parameter values to provide high accuracy and avoid overfitting. First, take C = [1000, 3000, 5000, 7000, 9000], γ = [0.5, 1, 5, 10, 15] to search the ranges of the optimal parameter values in 25 combinations of (C, γ) using the grid search method. We obtain the optimal values of parameters C and γ, which are approximately 3000 and 5, respectively. Then, the grid search and 10-fold cross-validation methods are implemented to perform fine searches around C = 3000 and γ = 5. Fig. 7a shows the 10-fold cross-validation of parameter C at γ = 5, where C_range = np.linspace (1000, 5000, 21). In this paper, np.linspace (A, B, n) represents the division of the integers in the range of A to B into n-1 segments with n node values. Fig. 7b shows the grid search and 10-fold cross-validation of parameter γ at C = 2200, where GM_range = np.linspace (1, 10, 10). The optimal values were obtained as C = 2200 and γ = 5. To determine whether the training accuracy of the model can be significantly increased if the search accuracy is further improved, the grid search and 10-fold cross-validation methods are again applied. Fig. 7c shows the 10-fold cross-validation of parameter C at γ = 5, where C_range = np.linspace (2000, 2400, 17). Fig. 7d shows the grid search and 10-fold cross-validation of parameter γ at C = 2350, where GM_range = np.linspace (4, 6, 11). Finally, the optimal values were obtained as C = 2350 and γ = 4.6. The results of the three times optimizations are shown in Table 5. 4.1.2. The OVO SVMs classifier for lithology identification Additionally, the Gaussian kernel
3175
[0.5, 1, 5, 10, 15] 5 np.linspace (1, 10, 10) 5 np.linspace (4, 6, 11) np.linspace (1000, 9000, 5) np.linspace (1000, 5000, 21) 3200 np.linspace (3000, 3400, 17) 3175 1 2
x1
x2 2
2 2
function
) and the grid search method are applied to
4.1.3. The RF classifier for lithology identification The RF model has two main parameters. Parameter n_estimators represents the number of trees in the forest, and it comes from the scikit-learn Python framework. As the number of trees increases, the calculation time will also increase, and the best predictive value will appear at a reasonable tree level. Another parameter Max_features represents the maximum number of features that can be used by a single decision tree, that is, a subset of randomly selected feature sets. The
3
gamma search range C search range
(
select the approximate ranges of the optimal training parameters C and 1 γ in the OVO SVMs, where = 2 . Then, the 10-fold cross-validation method is applied to optimize the objective function and find the best parameter values to provide high accuracy and avoid overfitting. First, take C = [1000, 3000, 5000, 7000, 9000], γ = [0.5, 1, 5, 10, 15] to search the ranges of the optimal parameter values in 25 combinations of (C, γ) using the grid search method. We obtain the optimal values of parameters C and γ, which are approximately 3000 and 5, respectively. Then, the grid search and 10-fold cross-validation methods are implemented to perform fine searches around C = 3000 and γ = 5. Fig. 8a shows the 10-fold cross-validation of parameter C at γ = 5, where C_range = np.linspace (1000, 5000, 21). Fig. 8b shows the 10-fold cross-validation of parameter γ at C = 3200, where GM_range = np.linspace (1, 10, 10). The optimal values were obtained as C = 3200 and γ = 5. To determine whether the training accuracy of the model can be significantly increased if the search accuracy is further improved, the grid search and 10-fold cross-validation methods are again applied. Fig. 8c shows the 10-fold cross-validation of parameter C at γ = 5, where C_range = np.linspace (3000, 3400, 17). Fig. 8d shows the grid search and 10-fold cross-validation of parameter γ at C = 3175, where GM_range = np.linspace (4, 6, 11). Finally, the optimal values were obtained as C = 3175 and γ = 5. The results of the three times optimizations are shown in Table 6.
25 21 10 17 11
3000 3200
K (x1, x2) = exp
Sequence
Table 6 Results of the three times optimizations by OVO SVMs.
combination number
optimal parameter C
optimal parameter gamma
training set sample number
training set accuracy
J. Sun et al.
330
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Fig. 9. The 10 fold cross-validation of parameter n_estimators in RF, max_features = 2.
optimized parameters were used to test the ability of the model to predict lithology. Table 8 shows the optimal parameters obtained after multiple optimizations of the three models. Fig. 10 shows the training set recognition accuracy, test set identification accuracy, and time consumption based on multiple optimization parameters in the three models. As shown in Table 8, for the three times parameter optimization of OVR SVMs and OVO SVMs, there is a significant change in parameter C, but parameter γ is essentially unchanged. For the RF, only the parameter n is optimized because max_features = sqrt (n_features), it is not optimized here. The histogram in Fig. 10 shows that the accuracies of the training set and test set for the three models are all over 85% and that the accuracy of the RF is greater than 90%. From the vertical perspective, for the multiple parameter optimization of the three models, the training set accuracy is RF > OVO SVMs > OVR SVMs, and the test set accuracy is RF > OVR SVMs > OVO SVMs. From the horizontal perspective, in the process of multiple parameter optimization, the lithology identification accuracy of the three models changed little and did not exceed 1%. Moreover, the histogram in Fig. 10 indicates that each time a parameter is optimized, the time required is OVR SVMs > OVO SVMs > RF, and OVR SVMs takes much longer than the other two models. Factors such as the model training accuracy, test accuracy and time consumption suggest that only one parameter optimization step is required, and the RF model is preferentially selected for lithology identification and prediction while drilling. Notably, the RF approach has higher accuracy and takes less time than the other methods. We obtained the lithology prediction results for the three models after one parameter optimization step, they shown in Appendix A. The model parameters and calculation results are shown in Table 9. The training set accuracies of OVR SVMs and OVO SVMs are very close at approximately 89%. The training set accuracy of RF is 92.54%. The test set accuracies of OVR SVMs and OVO SVMs are also very close, and they are approximately 87%. Additionally, the test set accuracy of RF is 90.44%. Fig. 11a, Fig. 11b and c show which lithology classes are misclassified as other classes for each model in the test set. Fig. 11a illustrates that the prediction accuracy of limestone is 100%, the
smaller the number of subsets is, the faster the variance will decrease, but at the same time the deviation will increase faster. Typically, in the classification problem, max_features = sqrt (n_features) (Behnam Partopour et al., 2018; E. Vigneau et al., 2018). The grid search method was used to select the approximate ranges of the RF optimal training parameters n_estimators and max_features, and 10-fold cross-validation was then used to optimize the objective function and find the best parameter values that provided high accuracy and avoided overfitting. First, take n_estimators = [10, 20, 30, 40, 50, 60, 70], max_features = [1, 2, 3, 4] to search the ranges of the optimal parameter values in 28 combinations of (n_estimators, max_features) using the grid search method. We obtain the optimal values of parameters n_estimators and max_features, which are approximately 50 and 2, respectively. Since it usually takes max_features = sqrt (n_features), let max_features = 2. Then, the grid search and 10-fold cross-validation methods are implemented to perform fine searches around n_estimators = 50. Fig. 9 shows the 10-fold cross-validation of parameter n_estimators at max_features = 2, where n_estimators = np.linspace (40, 60, 21). Finally, the optimal values are obtained as n_estimators = 56 and max_features = 2. The results of the two times optimizations are shown in Table 7. 4.2. Lithology identification while drilling 366 sets of LWD data from three wells were obtained and combined to form the test set. Then, the three classification models discussed above were used to identify the lithology while drilling. The training set recognition results and test set recognition results for the three models were compared. 5. Results and discussion After the data preparation and preprocessing steps, including (a) the normalization numeric input data, (b) correlation analysis of input data, and (c) selection of the optimal values of the parameters based on grid search and cross-validation techniques for three machine learning algorithms (OVR SVMs, OVO SVMs and RF). Different algorithms with
331
Journal of Petroleum Science and Engineering 176 (2019) 321–341
50 56
2 2
182 109
3345
0.9254 0.9262
prediction accuracy of siltstone is 80%, and the prediction accuracy of mudstone is only 76% for OVR SVMs model. In addition, Fig. 11b shows that the prediction accuracy of dolomitereaches 96.23% and that the prediction accuracy of medium sandstone, siltstone, limestone, and mudstone is less than 80% for OVO SVMs model. Fig. 11c indicates that for RF model, except the prediction accuracy of siltstone at 70%, the prediction accuracies of the other six lithologies are above 85%, and the stability is strong. The matrix of Fig. 11a, b and c show that the accurate identification ratios of the three algorithms for coarse sandstone, fine sandstone, and dolomite are generally high and all greater than 80%. The identification accuracy ratio of siltstone is the lowest at only 80% for OVR SVMs model, and the other two algorithms yield values less than 80%. The siltstone is commonly misclassified to medium sandstone, fine sandstone and mudstone. This result may occur when the composition of the sandstone is difficult to determine or there is a possible error in the lithological interpretation. Fig. 12 shows the lithological profile identified by the three methods. The section of the core (3273.200–3273.45 m) is identified correctly by OVO SVMs and RF, but wrongly by OVR SVMs. Lithology of the core between 3274.200 m and 3274.45 m is siltstone. Only RF identifies it as siltstone. Both OVR SVMs and OVO SVMs misjudge it as mudstone. The section of the core (3274.575–3274.825 m) is fine sandstone, but all three methods misjudge it as siltstone. Lithology of the core between 3279.825 m and 3280.325 m is medium sandstone. Only RF identifies it as medium sandstone. Both OVR SVMs and OVO SVMs misjudge it as mudstone. The section of the core (3285.800–3286.000 m) is identified correctly by OVR SVMs and RF, but wrongly by OVO SVMs. Lithology of the core between 3291.825 m and 3292.075 m is medium sandstone. Only RF identifies it as medium sandstone. OVR SVMs misjudge it as coarse sandstone and OVO SVMs misjudge it as fine sandstone. For the other sections between 3272.325 m and 3297.575 m, the lithology labels predicted by the three methods are the same as those determined by the corresponding cores. Therefore, in the case study investigated in this paper, the RF model is selected for lithology prediction while drilling. The training duration of the RF model is 182s, the training set accuracy is 92.54%, and the test set accuracy is 90.44%. Compared with the results obtained by using machine learning to identify lithology in existing papers, this accuracy rate is quite high, indicating that this method has certain advantages in identifying accuracy rate (Xiongyan Li and Li, 2013; Yunxin Xie et al., 2018; Arsalan A. Othman and Richard Gloaguen, 2017; Mohammad Ali Sebtosheikh and Ali Salehi, 2015). Of course, this also has a certain relationship with data samples. In this study, we not only compared and discussed the computational accuracy of the three models but also determined the model training time. To make the model more stable and reliable, it is necessary to continuously update the training data used for the model. If the training time is very long, then it is not possible to reuse the LWD data to build a more reliable lithology classification model in real-time. Therefore, the training time of the model must be considered in the optimization process. This paper does not consider the identification and classification of lithological characteristics and lithologic component distributions based on LWD curves and determined by machine learning, and these topics will be investigated in future research.
[1, 2, 3, 4] 2 np.linspace (10, 70, 7) np.linspace (40, 60, 21) 1 2
28 21
max_features search range n search range Sequence
Table 7 Results of the two times optimizations by RF.
combination number
optimal parameter n
optimal parameter max_features
time consuming(s)
training set sample number
training set accuracy
J. Sun et al.
6. Conclusions Logging data can be used to solve nonlinear function mapping problems and identify lithology during the drilling procedure. The relationship between the logging response and the actual reservoir characteristics is complex, so this mapping is typically highly nonlinear. There are many types of logging response characteristics, and generally more than two types of lithology exist. Therefore, using improved SVM multiple classifiers or RF classifiers is an effective way to solve this complex problem. 332
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Table 8 Results of the parameter optimization. Sequence
1 2 3
OVR SVMs
OVO SVMs
RF
optimal parameter C
optimal parameter gamma
optimal parameter C
optimal parameter gamma
optimal parameter n
optimal parameter max_features
3000 2200 2350
5 5 4.6
3000 3200 3175
5 5 5
50 56
2 2
Fig. 10. Training set identification accuracy, test set identification accuracy, and time consuming of three models. Table 9 Model parameters and calculation results after once parameter optimization. Classification method
parameter
OVR SVMs OVO SVMs RF
kernel = RBF kernel = RBF n_estimators = 50
C = 3000 gamma = 5 C = 3000 gamma = 5 max_features = 2
This work presents an optimal model that can be used to quickly identify the lithology while drilling. The identification results, identification accuracy and calculation time of lithology classification are obtained based on three machine learning algorithms. Some meaningful conclusions are listed below.
training set accuracy (%)
test set accuracy (%)
average running time(s)
88.65 89.25 92.54
87.16 86.61 90.44
2823 586 182
(c) If the parameter search range is appropriate, usually only one parameter optimization step is required, and additional parameter optimizations do not significantly improve the calculation accuracy and considerably increases the model training time. (d) Machine learning algorithms can be used to identify reservoir lithology while drilling. After the parameters are optimized, the training set accuracy of RF is the highest, and this accuracy is approximately 3% more than those of the other two models. The test set accuracy of RF is also the highest at approximately 4% more than those of the other two models. The model training time of RF is much less than those of the other two models. The use of the OVR SVMs model is not recommended because the model training time is too long.
(a) Before model training, a correlation analysis of the input data is very important for avoiding redundant data and reducing the dimension of input data, thereby improving the calculation accuracy and reducing the model training time. (b) The initial value range of the grid search is very important for the parameter optimization of the three models. This factor directly influences the accuracy and training time of the training model.
333
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Fig. 11. a—Confusion matrix plots on the test set of OVR SVMs model. b Confusion matrix plots on the test set of OVO SVMs model. c Confusion matrix plots on the test set of RF model.
334
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Fig. 12. Lithological profile identified by OVR SVMs, OVO SVMs and RF.
Acknowledgements
Foundation of China (No.51704235) and by Young Talent fund of University Association for Science and Technology in Shaanxi, China (No.20180417).
This work was supported by the National Natural Science Appendix A. Three training models' classification results of test set
1-coarse sandstone, 2-medium sandstone, 3-fine sandstone, 4-siltstone, 5-dolomite, 6-limestone and 7-mudstone. Table A
Three training models' classification results of test set No.
AC
CAL
GR
K
RD
SP
Ture type
Identified type by OVR SVMs
Identified type by OVO SVMs
Identified type by RF
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0.0388 0.3474 0.2184 0.3661 0.2919 0.3047 0.3972 0.3317 0.2939 0.3411 0.3147 0.3403 0.3694 0.2701 0.2841 0.2388 0.2485 0.2347 0.2420 0.2624 0.2311
0.0270 0.0902 0.0595 0.0821 0.0800 0.0652 0.4671 0.0668 0.0566 0.1286 0.1019 0.1738 0.1998 0.6177 0.0541 0.4868 0.5874 0.4053 0.0515 0.6347 0.4310
0.0948 0.1452 0.2522 0.1493 0.1418 0.1573 0.1437 0.1637 0.1569 0.1621 0.1604 0.3123 0.1903 0.0948 0.1148 0.0773 0.0752 0.0805 0.1087 0.1035 0.0762
0.1125 0.2023 0.7113 0.2017 0.2893 0.3193 0.1997 0.3330 0.3201 0.4367 0.4037 0.6718 0.4335 0.2052 0.1723 0.2674 0.2303 0.3024 0.1911 0.2428 0.1999
0.0014 0.0027 0.0039 0.0033 0.0035 0.0036 0.0037 0.0037 0.0037 0.0036 0.0037 0.0044 0.0046 0.0049 0.0046 0.0053 0.0054 0.0055 0.0043 0.0052 0.0060
0.6145 0.3396 0.9186 0.3432 0.3446 0.3311 0.4293 0.3386 0.3338 0.3424 0.3445 0.3476 0.4572 0.3828 0.3218 0.3743 0.3728 0.3748 0.3310 0.3926 0.3725
5 7 3 7 7 7 7 7 7 7 7 7 7 3 7 3 3 3 7 7 3
5 7 3 7 7 7 7 7 7 7 7 7 7 3 7 3 3 3 7 7 3
5 7 3 7 7 7 7 7 7 7 7 7 7 3 7 3 3 3 7 7 3
5 7 3 7 7 7 7 7 7 7 7 7 7 3 7 3 3 3 7 3 3
335
(continued on next page)
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Table A (continued) No.
AC
CAL
GR
K
RD
SP
Ture type
Identified type by OVR SVMs
Identified type by OVO SVMs
Identified type by RF
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
0.4671 0.4295 0.3496 0.4585 0.2290 0.2365 0.3870 0.1986 0.3176 0.3754 0.3691 0.3990 0.3455 0.2692 0.3721 0.2374 0.2206 0.3072 0.2423 0.2577 0.2516 0.2085 0.2781 0.2724 0.2705 0.2774 0.2715 0.2683 0.3263 0.2921 0.2377 0.2501 0.3213 0.2359 0.2150 0.2929 0.2432 0.2366 0.2959 0.3250 0.2337 0.1802 0.2713 0.2289 0.2402 0.2136 0.2375 0.2755 0.2698 0.2297 0.2024 0.4318 0.2163 0.3575 0.2214 0.2553 0.2048 0.2010 0.3027 0.2115 0.3455 0.2431 0.2071 0.1923 0.2355 0.1983 0.2102 0.2844 0.3364 0.2452 0.2443 0.1984 0.2253 0.3213
0.4390 0.4221 0.2817 0.4668 0.2777 0.0829 0.2219 0.3138 0.1541 0.3253 0.2755 0.4625 0.3290 0.1526 0.4027 0.1545 0.2966 0.2290 0.1551 0.3934 0.1513 0.0931 0.0718 0.3429 0.0722 0.0810 0.0759 0.0734 0.2231 0.2739 0.4632 0.0955 0.0674 0.1365 0.2413 0.3734 0.2493 0.1604 0.3779 0.2781 0.0837 0.2014 0.1242 0.1434 0.0941 0.1967 0.0963 0.0620 0.4620 0.0791 0.1408 0.2697 0.5741 0.4864 0.0749 0.0627 0.1364 0.1278 0.0215 0.0813 0.4979 0.4369 0.1994 0.1286 0.3619 0.2872 0.2382 0.0867 0.4976 0.0588 0.0583 0.2139 0.4046 0.2381
0.1526 0.1635 0.2365 0.1421 0.0888 0.1509 0.2460 0.0742 0.2285 0.2198 0.2512 0.1283 0.2171 0.0794 0.1743 0.1029 0.0901 0.2102 0.1031 0.2844 0.0874 0.1199 0.1736 0.2431 0.1658 0.1535 0.1553 0.1595 0.2230 0.2197 0.0838 0.1695 0.1643 0.1189 0.2366 0.1118 0.2153 0.0680 0.1770 0.2165 0.2383 0.1252 0.2920 0.0981 0.1999 0.1101 0.1834 0.1515 0.2214 0.1656 0.1055 0.2728 0.3140 0.3081 0.1616 0.1997 0.1154 0.1847 0.2424 0.1604 0.1102 0.2336 0.1202 0.1807 0.1655 0.0982 0.0963 0.2430 0.1187 0.2310 0.2286 0.0914 0.2381 0.2859
0.2903 0.3024 0.5210 0.2792 0.2960 0.4731 0.5357 0.2325 0.5585 0.4059 0.5629 0.2738 0.4278 0.2138 0.3113 0.2368 0.3072 0.4643 0.2543 0.7318 0.2690 0.4220 0.3264 0.4850 0.2929 0.2658 0.2639 0.2720 0.4733 0.4800 0.2472 0.2927 0.3260 0.2203 0.5281 0.2744 0.5148 0.1975 0.3145 0.5049 0.4848 0.3457 0.7322 0.3400 0.4796 0.2003 0.4256 0.3336 0.2986 0.4572 0.2827 0.7725 0.4087 0.5192 0.4504 0.4961 0.2086 0.3876 0.4846 0.5325 0.2662 0.3221 0.3769 0.3874 0.3127 0.2774 0.2299 0.5740 0.2784 0.5114 0.5299 0.3048 0.3443 0.8085
0.0065 0.0067 0.0065 0.0068 0.0063 0.0062 0.0056 0.0066 0.0070 0.0071 0.0072 0.0077 0.0074 0.0070 0.0077 0.0071 0.0084 0.0077 0.0074 0.0089 0.0074 0.0075 0.0075 0.0095 0.0078 0.0078 0.0079 0.0079 0.0069 0.0087 0.0082 0.0081 0.0089 0.0085 0.0091 0.0096 0.0093 0.0093 0.0099 0.0098 0.0094 0.0114 0.0092 0.0099 0.0100 0.0109 0.0104 0.0110 0.0113 0.0102 0.0116 0.0123 0.0142 0.0117 0.0105 0.0110 0.0122 0.0110 0.0133 0.0111 0.0109 0.0123 0.0117 0.0115 0.0124 0.0128 0.0117 0.0114 0.0114 0.0120 0.0122 0.0156 0.0131 0.0140
0.3911 0.3909 0.4138 0.3913 0.3815 0.4181 0.3353 0.3760 0.9796 0.4048 0.4142 0.3916 0.4053 0.4105 0.3908 0.3890 0.3589 0.3939 0.3899 0.4284 0.3901 0.4238 0.4565 0.4011 0.4558 0.4534 0.4542 0.4550 0.2662 0.4082 0.3765 0.4453 0.3848 0.3931 0.9766 0.3918 0.4084 0.4027 0.3909 0.4071 0.5305 0.3639 0.5379 0.3925 0.4527 0.3657 0.4478 0.3834 0.3098 0.4744 0.3702 0.2270 0.4329 0.3188 0.4788 0.4909 0.3664 0.4280 0.9417 0.4526 0.3907 0.3125 0.3893 0.4260 0.3911 0.3919 0.3835 0.4957 0.3917 0.4974 0.4982 0.3574 0.3155 0.2227
2 2 7 2 3 3 7 3 7 7 7 2 7 3 2 3 2 4 3 4 3 3 7 7 7 7 7 7 2 7 2 3 3 3 7 2 7 3 2 7 4 2 7 3 3 3 3 7 4 4 3 2 7 2 4 4 3 2 7 2 2 4 3 2 2 2 2 3 2 4 4 2 4 2
1 1 7 1 3 3 7 3 7 7 7 2 7 3 7 3 2 7 3 4 3 3 7 7 3 7 7 7 2 7 2 3 7 3 7 2 7 3 7 7 4 2 3 3 7 3 3 3 2 4 3 2 7 2 4 4 3 2 7 2 2 2 2 2 2 2 3 7 2 4 4 2 2 2
3 3 7 3 3 3 7 3 7 7 7 2 7 3 3 3 2 7 3 4 3 3 7 7 7 7 7 7 2 7 2 3 7 3 7 2 7 3 7 7 4 2 3 3 3 3 3 7 2 3 3 2 7 2 4 4 3 3 7 3 2 2 2 2 3 2 3 3 2 4 4 2 2 2
2 3 7 2 3 3 7 3 7 7 7 2 7 3 3 3 2 4 3 4 3 3 7 7 7 7 7 7 2 7 2 3 7 3 7 2 7 3 3 7 4 2 7 3 3 3 3 7 2 4 3 2 7 2 4 4 3 2 7 2 2 2 2 2 2 2 2 3 2 4 4 2 2 2
336
(continued on next page)
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Table A (continued) No.
AC
CAL
GR
K
RD
SP
Ture type
Identified type by OVR SVMs
Identified type by OVO SVMs
Identified type by RF
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
0.1895 0.3196 0.1986 0.1937 0.1861 0.3434 0.2101 0.2077 0.2913 0.1869 0.1906 0.3295 0.2155 0.1695 0.1941 0.1675 0.1966 0.2409 0.2171 0.2176 0.2137 0.2159 0.1990 0.2352 0.2297 0.1956 0.2115 0.2315 0.2455 0.1959 0.2859 0.1815 0.3343 0.5772 0.1720 0.2098 0.2625 0.2505 0.2159 0.2429 0.1748 0.2094 0.1758 0.2365 0.2012 0.2288 0.2238 0.1956 0.2099 0.2449 0.2464 0.2220 0.3072 0.2793 0.2924 0.2760 0.2232 0.2835 0.2038 0.2632 0.2318 0.2064 0.1867 0.1893 0.2494 0.7132 0.2055 0.2518 0.2222 0.2919 0.1862 0.1424 0.2171 0.2810
0.0744 0.0188 0.0827 0.1159 0.0706 0.0189 0.2668 0.0976 0.1687 0.0695 0.0691 0.0250 0.1138 0.0528 0.0679 0.0479 0.0669 0.2181 0.0269 0.0942 0.3460 0.2729 0.0959 0.0550 0.0344 0.0680 0.0318 0.1534 0.2213 0.0670 0.2140 0.0822 0.3350 0.4369 0.4213 0.0461 0.0172 0.2135 0.0288 0.1475 0.0913 0.1111 0.4441 0.1470 0.1144 0.0153 0.1392 0.1172 0.0558 0.0314 0.0542 0.1620 0.3847 0.0336 0.4938 0.3280 0.1128 0.3419 0.0536 0.3220 0.0756 0.0985 0.3159 0.3286 0.3608 0.4892 0.1276 0.3209 0.1104 0.3602 0.0647 0.0306 0.1114 0.0480
0.1311 0.2126 0.1189 0.1327 0.1436 0.2037 0.0755 0.1077 0.2454 0.1536 0.1604 0.2079 0.0987 0.0670 0.1653 0.0710 0.1691 0.1006 0.1595 0.1958 0.1436 0.0660 0.1456 0.2120 0.3457 0.1676 0.3522 0.2497 0.2929 0.1694 0.2549 0.1098 0.2791 0.1843 0.1042 0.1457 0.2104 0.2970 0.1352 0.0809 0.2296 0.0821 0.2496 0.0768 0.0831 0.1943 0.0795 0.0875 0.1934 0.0309 0.0186 0.2561 0.2539 0.2369 0.2317 0.0690 0.2441 0.0647 0.1814 0.0700 0.0252 0.2261 0.3028 0.1101 0.3183 0.1554 0.1952 0.0682 0.2534 0.0589 0.1163 0.3168 0.2504 0.2572
0.4089 0.4888 0.3819 0.4151 0.4298 0.5565 0.2871 0.3557 0.6285 0.4457 0.4600 0.6139 0.3400 0.2362 0.4735 0.2464 0.4856 0.2295 0.2591 0.3861 0.3088 0.2768 0.4534 0.4711 0.5796 0.4314 0.5804 0.6233 0.8291 0.4943 0.4270 0.3666 0.5090 0.2205 0.1766 0.3588 0.3896 0.8637 0.2927 0.2768 0.3912 0.2239 0.2567 0.2682 0.2223 0.4000 0.2809 0.2201 0.4757 0.0924 0.0654 0.2551 0.4951 0.4343 0.5539 0.2515 0.3980 0.2484 0.3145 0.2501 0.0566 0.4824 0.4274 0.2887 0.4373 0.1584 0.4743 0.2454 0.6076 0.2424 0.2923 0.6157 0.5967 0.5164
0.0129 0.0129 0.0129 0.0124 0.0130 0.0131 0.0160 0.0130 0.0156 0.0132 0.0134 0.0135 0.0132 0.0150 0.0135 0.0151 0.0138 0.0124 0.0152 0.0128 0.0142 0.0171 0.0132 0.0136 0.0133 0.0133 0.0133 0.0155 0.0160 0.0145 0.0139 0.0141 0.0141 0.0155 0.0140 0.0163 0.0144 0.0182 0.0169 0.0174 0.0175 0.0150 0.0161 0.0180 0.0157 0.0153 0.0190 0.0161 0.0165 0.0336 0.0182 0.0157 0.0168 0.0206 0.0186 0.0211 0.0170 0.0210 0.0184 0.0215 0.0187 0.0179 0.0240 0.0187 0.0169 0.0189 0.0192 0.0218 0.0188 0.0219 0.0208 0.0227 0.0188 0.0218
0.3648 0.9513 0.3639 0.4353 0.3658 0.9568 0.3536 0.3632 0.2437 0.3668 0.3679 0.9627 0.3625 0.7942 0.3689 0.7921 0.3700 0.2320 0.3746 0.9624 0.3914 0.3521 0.4389 0.4929 0.5229 0.4822 0.5230 0.2443 0.2170 0.3710 0.4170 0.3934 0.3205 0.4541 0.3924 0.3779 0.9473 0.2043 0.3750 0.0885 0.1512 0.4190 0.4139 0.0871 0.4149 0.9473 0.0984 0.4110 0.3801 0.3316 0.1012 0.1938 0.3190 0.9788 0.3205 0.0638 0.2118 0.0643 0.3958 0.0642 0.0786 0.4116 0.4348 0.3178 0.5536 0.4539 0.1307 0.0654 0.1637 0.0657 0.3739 0.4359 0.1506 0.9798
3 7 3 2 3 7 2 3 2 3 3 7 3 3 3 3 3 3 3 7 2 2 2 4 2 2 2 2 2 3 7 3 2 1 2 3 3 2 3 3 5 3 7 3 3 7 3 3 4 5 3 7 2 7 7 3 2 3 3 3 3 7 7 2 7 1 1 3 1 3 3 7 1 7
3 7 3 2 3 7 2 3 2 3 3 7 3 3 3 3 3 3 3 7 2 2 2 4 2 2 2 2 2 3 7 3 7 1 2 3 3 2 3 3 3 3 7 3 3 3 3 3 3 5 7 7 7 7 7 3 3 3 3 3 3 7 7 2 7 1 1 3 1 3 3 7 1 7
3 7 3 2 3 7 2 3 2 3 3 7 3 3 3 3 3 3 3 7 2 2 2 4 2 2 2 2 2 3 7 3 2 4 2 3 3 2 3 3 5 3 7 3 3 7 3 3 3 5 3 7 7 7 7 3 3 3 3 3 3 7 7 2 7 1 1 3 1 3 3 7 1 7
3 7 3 2 3 7 2 3 2 3 3 7 3 3 3 3 3 3 3 7 2 2 2 4 2 2 2 2 2 3 7 3 7 1 2 3 3 2 3 3 5 3 7 3 3 7 3 3 3 5 3 7 2 7 7 3 7 3 3 3 3 7 7 2 7 1 1 3 1 3 3 7 1 7
337
(continued on next page)
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Table A (continued) No.
AC
CAL
GR
K
RD
SP
Ture type
Identified type by OVR SVMs
Identified type by OVO SVMs
Identified type by RF
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243
0.1780 0.1897 0.1746 0.2753 0.2085 0.2996 0.2761 0.2153 0.1955 0.2337 0.1791 0.2453 0.1969 0.1830 0.2646 0.1837 0.2303 0.4817 0.1703 0.2478 0.1833 0.2169 0.1767 0.1770 0.0113 0.2693 0.2450 0.3104 0.1896 0.2880 0.1582 0.1983 0.2098 0.1502 0.2302 0.1553 0.2076 0.1872 0.2483 0.1495 0.1771 0.1916 0.2446 0.2431 0.2621 0.2259 0.2018 0.1785 0.2043 0.2371 0.1631 0.3522 0.2300 0.1526 0.2006 0.2292 0.3377 0.1566 0.1480 0.4089 0.1819 0.0726 0.2459 0.0767 0.1262 0.1873 0.3573 0.1812 0.2547 0.1841 0.2290 0.1917 0.2257 0.4354
0.0548 0.1030 0.0294 0.0309 0.2645 0.0200 0.0290 0.2896 0.1397 0.0878 0.1900 0.1600 0.0508 0.1761 0.1806 0.0444 0.1032 0.9947 0.3060 0.0363 0.1151 0.0667 0.1351 0.1200 0.0413 0.0292 0.4210 0.0174 0.1485 0.0409 0.0709 0.1119 0.0691 0.0331 0.0556 0.1423 0.6584 0.1178 0.1169 0.1889 0.4148 0.7355 0.0521 0.1169 0.0663 0.0074 0.1409 0.1399 0.0400 0.1169 0.0304 0.2055 0.1171 0.0247 0.0552 0.1172 0.2596 0.0274 0.0348 0.1917 0.1096 0.0447 0.0288 0.0428 0.0924 0.0309 0.1204 0.0306 0.1023 0.0386 0.1253 0.0276 0.1375 0.1181
0.1144 0.0538 0.1293 0.2327 0.1014 0.3318 0.2379 0.0802 0.1907 0.0232 0.2966 0.1843 0.2357 0.1900 0.2252 0.1582 0.2254 0.1989 0.1139 0.2527 0.0852 0.2521 0.0948 0.0926 0.0125 0.2257 0.1061 0.3325 0.0659 0.1394 0.1633 0.1639 0.0330 0.1519 0.2359 0.0975 0.2480 0.2262 0.0310 0.4833 0.0985 0.2489 0.2742 0.0302 0.2879 0.1332 0.1164 0.1063 0.2271 0.0303 0.3004 0.2588 0.0344 0.1333 0.2372 0.0334 0.2598 0.1321 0.1174 0.2658 0.2168 0.1138 0.2625 0.1389 0.3031 0.1296 0.2964 0.1329 0.0482 0.1353 0.0847 0.2493 0.0500 0.2825
0.2841 0.1874 0.2378 0.4967 0.2350 0.7874 0.4200 0.1905 0.4627 0.0697 0.4467 0.5575 0.4147 0.4314 0.3678 0.3525 0.3839 0.5329 0.3759 0.5013 0.3094 0.5273 0.3616 0.3632 0.0670 0.5500 0.4099 0.7395 0.1520 0.3012 0.3137 0.3781 0.0721 0.4123 0.4745 0.1707 0.5029 0.3644 0.1351 0.5808 0.3441 0.4490 0.6030 0.1327 0.6614 0.2569 0.2400 0.3477 0.6539 0.1295 0.6451 0.5214 0.1144 0.3111 0.4546 0.1172 0.4884 0.3129 0.3427 0.5490 0.2402 0.1800 0.5148 0.2267 0.2541 0.3358 0.6070 0.3336 0.1558 0.1886 0.2708 0.6515 0.2170 0.5720
0.0212 0.0172 0.0217 0.0167 0.0218 0.0182 0.0286 0.0224 0.0208 0.0209 0.0265 0.0199 0.0253 0.0202 0.0192 0.0225 0.0209 0.0210 0.0225 0.0221 0.0279 0.0229 0.0256 0.0253 0.0221 0.0228 0.0248 0.0224 0.0217 0.0184 0.0267 0.0251 0.0264 0.0284 0.0273 0.0258 0.0276 0.0306 0.0280 0.0262 0.0309 0.0287 0.0280 0.0291 0.0287 0.0311 0.0316 0.0315 0.0298 0.0301 0.0288 0.0297 0.0305 0.0325 0.0304 0.0310 0.0307 0.0338 0.0296 0.0313 0.0299 0.0375 0.0344 0.0368 0.0346 0.0383 0.0348 0.0385 0.0471 0.0320 0.0363 0.0375 0.0428 0.0370
0.3736 0.1344 0.3732 0.9876 0.1077 0.9289 0.9753 0.1052 0.1311 0.0709 0.4354 0.3873 0.1541 0.1289 0.3252 0.3902 0.4125 0.3102 0.3173 0.4086 0.3692 0.5156 0.0897 0.0868 0.5466 0.9896 0.3115 0.9289 0.1288 0.9971 0.3862 0.9617 0.0981 0.4421 0.4135 0.2708 0.3535 0.4121 0.3399 0.3213 0.3097 0.3562 0.3528 0.3410 0.3239 0.4207 0.1177 0.4292 0.3765 0.3430 0.3307 0.4357 0.3655 0.4338 0.4155 0.3722 0.4343 0.4311 0.9545 0.4372 0.5427 0.2824 0.5311 0.2918 0.3232 0.4148 0.4448 0.4183 0.1274 0.4390 0.4205 0.3210 0.0809 0.4386
3 3 3 7 3 7 7 3 1 3 7 7 5 1 2 3 7 2 2 3 7 4 7 7 5 7 2 7 3 7 3 7 3 2 7 2 7 7 1 2 2 7 7 1 7 2 3 2 3 1 7 7 1 2 7 1 7 2 2 7 1 5 4 5 2 2 7 2 2 5 1 3 7 7
3 3 3 7 3 7 7 3 1 3 7 7 3 1 7 3 7 2 2 7 2 4 7 7 5 7 2 7 3 7 3 2 3 2 7 2 7 7 1 2 2 7 7 1 7 2 3 2 3 1 7 7 1 2 7 1 7 2 2 7 1 5 4 5 2 2 7 2 2 5 2 3 3 3
3 3 3 7 3 7 7 3 1 3 7 7 5 1 7 3 7 2 2 7 2 7 4 4 5 7 2 7 3 7 3 2 3 2 7 2 7 7 1 2 2 7 7 1 7 6 3 2 3 1 7 7 1 2 7 1 7 2 2 7 1 5 4 5 2 2 7 2 2 5 2 3 3 3
3 3 3 7 3 7 7 3 1 3 7 7 3 1 7 3 7 2 2 3 7 4 4 4 5 7 2 7 3 7 3 3 3 2 7 2 7 7 1 2 2 7 7 1 7 2 3 2 3 1 7 7 1 2 7 1 7 2 2 7 1 5 4 5 2 2 7 2 2 5 2 3 3 7
338
(continued on next page)
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Table A (continued) No.
AC
CAL
GR
K
RD
SP
Ture type
Identified type by OVR SVMs
Identified type by OVO SVMs
Identified type by RF
244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317
0.0279 0.3095 0.1602 0.1309 0.1625 0.3014 0.2701 0.4233 0.1914 0.1574 0.2362 0.0224 0.1631 0.2883 0.2279 0.1328 0.2659 0.4488 0.1580 0.2229 0.1935 0.2197 0.4452 0.2317 0.1623 0.2074 0.1292 0.2712 0.4524 0.0949 0.5382 0.4530 0.1941 0.1577 0.2152 0.1688 0.3257 0.1512 0.1688 0.1841 0.0299 0.1564 0.1262 0.1499 0.1739 0.0281 0.1154 0.1574 0.0770 0.0267 0.0549 0.1548 0.2808 0.2875 0.1812 0.2684 0.0687 0.0825 0.0886 0.2905 0.0609 0.2854 0.0489 0.0689 0.0915 0.0591 0.0394 0.0607 0.6187 0.0507 0.6789 0.2654 0.1807 0.0800
0.0371 0.0934 0.0708 0.0725 0.0777 0.1057 0.0428 0.1812 0.1137 0.0243 0.0328 0.0400 0.0726 0.1212 0.1395 0.0281 0.1325 0.1202 0.0645 0.0519 0.1385 0.1382 0.1673 0.1367 0.0566 0.1575 0.0939 0.2518 0.1379 0.0992 0.1596 0.0907 0.1259 0.0744 0.0337 0.0821 0.0547 0.0486 0.1425 0.0467 0.0310 0.0542 0.1083 0.0319 0.0575 0.0292 0.1233 0.1506 0.0748 0.0341 0.0773 0.0489 0.0550 0.0455 0.1426 0.0635 0.0622 0.1001 0.0476 0.0336 0.0543 0.0761 0.0177 0.0413 0.0263 0.0571 0.0317 0.0635 0.2833 0.0316 0.5824 0.0885 0.1515 0.0425
0.0565 0.0525 0.0967 0.2241 0.0941 0.0569 0.2673 0.3007 0.0966 0.1105 0.3229 0.0502 0.0900 0.0593 0.0486 0.1516 0.0583 0.3075 0.3088 0.0895 0.0681 0.0504 0.3107 0.0543 0.0837 0.0505 0.3352 0.3311 0.3210 0.2124 0.1376 0.3228 0.0514 0.0929 0.2088 0.0789 0.3344 0.1001 0.1405 0.0866 0.0055 0.1068 0.2085 0.1590 0.0847 0.0079 0.1895 0.0818 0.1370 0.0418 0.0098 0.1311 0.0562 0.0648 0.0701 0.0423 0.0986 0.1893 0.1314 0.0612 0.0924 0.2954 0.0723 0.1187 0.1291 0.0687 0.0494 0.0621 0.0933 0.0322 0.2035 0.2864 0.0887 0.1210
0.1252 0.1852 0.3334 0.6253 0.3217 0.1955 0.4135 0.6040 0.3183 0.1574 0.6318 0.1838 0.3080 0.1981 0.2011 0.3064 0.1947 0.5879 0.7135 0.3018 0.1957 0.1933 0.6020 0.1919 0.2941 0.0646 0.2728 0.7260 0.5986 0.5041 0.0902 0.5939 0.1480 0.2331 0.5824 0.1558 0.7638 0.3187 0.2301 0.2964 0.0548 0.2354 0.4461 0.2138 0.1905 0.0459 0.4365 0.1858 0.1657 0.1071 0.0836 0.2307 0.0534 0.0596 0.1911 0.0358 0.1673 0.4747 0.3278 0.0435 0.1826 0.7824 0.0735 0.2142 0.1951 0.1651 0.1750 0.1234 0.0922 0.1031 0.0999 0.7797 0.1524 0.3032
0.0373 0.0492 0.0412 0.0371 0.0418 0.0508 0.0364 0.0390 0.0424 0.0353 0.0383 0.0396 0.0435 0.0526 0.0485 0.0382 0.0541 0.0415 0.0407 0.0526 0.0447 0.0525 0.0423 0.0543 0.0470 0.0622 0.0435 0.0423 0.0438 0.0440 0.0409 0.0444 0.0446 0.0457 0.0483 0.0484 0.0442 0.0534 0.0446 0.0561 0.0470 0.0475 0.0459 0.0480 0.0500 0.0485 0.0467 0.0483 0.0461 0.0490 0.1510 0.0496 0.0761 0.0762 0.0532 0.0790 0.0501 0.0541 0.0577 0.0847 0.0779 0.0566 0.0509 0.0575 0.0573 0.0559 0.0594 0.0828 0.0603 0.0680 0.0661 0.0704 0.0744 0.0670
0.3412 0.1100 0.3940 0.4259 0.3937 0.1037 0.5272 0.4441 0.4024 0.4394 0.3519 0.3445 0.3934 0.0990 0.0827 0.9471 0.0955 0.4399 0.3289 0.1082 0.8904 0.0871 0.4433 0.0918 0.3930 0.0465 0.3263 0.3511 0.4423 0.4336 0.4495 0.4412 0.4020 0.2848 0.3232 0.1125 0.3575 0.1141 0.4230 0.1106 0.5241 0.2848 0.4274 0.2841 0.8863 0.5228 0.4285 0.2689 0.3709 0.3401 0.3171 0.2845 0.0558 0.0519 0.1282 0.0614 0.3703 0.4308 0.3003 0.0588 0.1494 0.3466 0.4394 0.2819 0.6120 0.3694 0.3387 0.1485 0.4482 0.3354 0.4890 0.3474 0.8789 0.2970
5 7 2 6 2 7 1 7 2 5 7 5 2 7 7 2 7 7 7 4 2 7 7 7 2 4 2 7 7 6 1 7 1 2 3 3 7 4 1 4 5 2 5 2 2 5 5 2 5 5 6 2 2 2 3 2 5 6 5 2 5 7 5 5 5 5 5 5 1 5 1 7 2 5
5 2 2 6 2 2 1 7 2 5 7 5 2 2 3 2 4 3 7 4 2 3 4 3 2 4 2 7 3 6 1 3 1 2 3 3 7 4 1 4 5 2 5 2 2 5 5 2 5 5 6 2 2 2 3 2 5 6 5 2 5 7 5 5 5 5 5 5 1 5 1 7 2 5
5 2 2 6 2 2 1 4 2 5 7 5 2 2 3 2 2 3 7 4 2 3 4 3 2 4 2 7 4 6 5 3 1 2 3 3 7 4 1 4 5 2 5 2 2 5 5 2 5 5 6 2 2 2 3 2 5 6 5 2 5 7 5 5 5 5 5 5 2 5 1 7 2 5
5 2 2 6 2 2 1 7 2 5 7 5 2 2 2 2 2 7 7 4 2 2 4 2 2 4 2 7 7 6 1 3 2 2 3 3 7 3 2 3 5 2 5 2 2 5 5 2 5 5 6 2 2 2 3 2 5 6 5 2 5 7 5 5 5 5 5 5 1 5 1 7 2 5
339
(continued on next page)
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al.
Table A (continued) No.
AC
CAL
GR
K
RD
SP
Ture type
Identified type by OVR SVMs
Identified type by OVO SVMs
Identified type by RF
318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366
0.0728 0.0700 0.0337 0.0316 0.3046 0.0290 0.0700 0.0301 0.0286 0.0492 0.0713 0.0187 0.0740 0.2318 0.0697 0.2372 0.0674 0.1101 0.0702 0.0325 0.1005 0.0858 0.0634 0.0313 0.0399 0.0489 0.0469 0.2081 0.0445 0.0492 0.1919 0.1219 0.0315 0.1076 0.0121 0.0431 0.2308 0.0745 0.0389 0.0544 0.0133 0.1233 0.0036 0.0077 0.0169 0.1310 0.0362 0.0146 0.0144
0.1097 0.0392 0.0415 0.0454 0.0863 0.0477 0.0428 0.0378 0.0388 0.0922 0.0267 0.0145 0.0935 0.2529 0.0241 0.2555 0.1011 0.0610 0.0177 0.1126 0.0252 0.0071 0.1073 0.0092 0.1029 0.0289 0.0212 0.4417 0.1069 0.0370 0.0388 0.1635 0.0919 0.0907 0.0984 0.0988 0.4416 0.0480 0.0167 0.1721 0.1365 0.0677 0.0272 0.0254 0.1388 0.0690 0.0320 0.1273 0.1342
0.1449 0.1214 0.0308 0.0302 0.3173 0.0287 0.0947 0.0325 0.0470 0.1719 0.0996 0.0050 0.0356 0.0494 0.1280 0.0505 0.1275 0.1289 0.1195 0.1626 0.0591 0.0211 0.0093 0.0706 0.0336 0.0652 0.0548 0.0704 0.0306 0.1020 0.1826 0.0743 0.0504 0.0382 0.0243 0.0265 0.0778 0.0248 0.0352 0.0235 0.1367 0.0307 0.0185 0.0299 0.1184 0.0294 0.0301 0.0350 0.0335
0.4671 0.1373 0.1937 0.1840 0.9944 0.1701 0.2480 0.0912 0.1365 0.1512 0.1530 0.0288 0.0443 0.1721 0.1719 0.1731 0.4236 0.0930 0.2126 0.2354 0.0886 0.0606 0.0441 0.1057 0.1637 0.2184 0.1299 0.2726 0.1607 0.1460 0.3270 0.1200 0.3229 0.0791 0.0260 0.1488 0.2929 0.0393 0.0558 0.0534 0.3521 0.0646 0.0815 0.0803 0.3586 0.0602 0.0711 0.1786 0.1864
0.0711 0.0662 0.0731 0.0741 0.0714 0.0782 0.0789 0.0810 0.0811 0.0803 0.0846 0.0847 0.0870 0.1065 0.0997 0.1172 0.0973 0.1006 0.1045 0.1155 0.1262 0.1384 0.1414 0.1462 0.1638 0.1581 0.1627 0.1735 0.1774 0.1859 0.1669 0.1755 0.1776 0.1838 0.1972 0.1974 0.2124 0.2211 0.2212 0.2730 0.2456 0.2555 0.2853 0.2859 0.2817 0.2821 0.3597 0.8036 0.8911
0.4258 0.4394 0.3336 0.3336 0.3567 0.3337 0.2916 0.3382 0.3389 0.2764 0.6041 0.5249 0.4585 0.3567 0.3840 0.3568 0.4260 0.2955 0.4385 0.2890 0.4389 0.4314 0.4345 0.4382 0.3012 0.7070 0.6147 0.3029 0.3016 0.3648 0.3302 0.2892 0.4104 0.2658 0.4045 0.3019 0.3049 0.4325 0.4337 0.6809 0.3017 0.2558 0.5825 0.5946 0.3029 0.2538 0.4355 0.3116 0.3117
5 7 5 5 7 5 5 5 5 6 5 5 5 2 5 2 5 7 5 6 7 5 5 7 6 5 5 2 6 5 4 3 5 3 5 6 2 5 7 5 6 3 5 5 6 3 5 6 6
5 5 5 5 7 5 5 5 5 6 5 5 5 2 5 2 5 7 5 6 7 5 5 7 6 5 5 2 6 5 4 3 5 3 5 6 2 7 7 5 6 4 5 5 6 4 5 6 6
5 7 5 5 7 5 5 5 5 6 5 5 5 2 5 2 5 7 5 6 7 5 5 7 6 5 5 2 6 5 4 3 5 3 5 6 2 7 7 5 6 4 5 5 6 4 5 6 6
5 5 5 5 7 5 5 5 5 6 5 5 5 2 5 2 5 7 5 6 7 5 5 7 6 5 5 2 6 5 4 3 5 3 5 6 2 5 7 5 6 4 5 5 6 4 5 6 6
https://doi.org/10.1016/j.petrol.2016.02.017. Fratello, Michele, Tagliaferri, Roberto, 2018. Decision Trees and Random Forests. Reference Module in Life Sciences. https://doi.org/10.1016/B978-0-12-809633-8. 20337-3. Fu, J., Wei, X., Nan, J., 2013. Characteristics and origin of reservoirs of gas fields in the Upper Paleozoic tight sandstone, Ordos Basin. J. Palaeogeogr. 15 (4), 529–538. Genuer, Robin, Poggi, Jean-Michel, Tuleau-Malot, Christine, et al., 2017. Random forests for big data. Big Data Res. 9, 28–46. https://doi.org/10.1016/j.bdr.2017.07.003. Grinand, C., Rakotomalala, F., Gond, V., Vaudry, R., Bernoux, M., Vieilledent, G., 2013. Estimating deforestation in tropical humid and dry forests in Madagascar from 2000 to 2010 using multi-date Landsat satellite images and the random forests classifier. Remote Sens. Environ. 139, 68–80. https://doi.org/10.1016/j.rse.2013.07.008. Hsu, C.-W., Lin, C.-J., 2002. A comparison of methods for multiclass support vector machines. Trans. Neur. Netw. 13, 415–425. Ortegon, Jaime, Ledesma-Alonso, Rene, Barbosa, Romeli, et al., 2018. Material phase classification by means of support vector machines. Comput. Mater. Sci. 148, 336–342. https://doi.org/10.1016/j.commatsci.2018.02.054. Konaté, Ahmed Amara, Pan, Heping, Fang, Sinan, et al., 2015. Capability of self-organizing map neural network in geophysical log data classification: case study from the CCSD-MH. J. Appl. Geophys. 118, 37–46. https://doi.org/10.1016/j.jappgeo.2015. 04.004. Li, Xiongyan, Li, Hongqi, 2013. A new method of identification of complex lithologies and reservoirs: task-driven data mining. J. Petrol. Sci. Eng. 109, 241–249. https://doi. org/10.1016/j.petrol.2013.08.049. Li, Rong, Zhong, Yihua, 2009. Dentification method of oil/gas/water layer based on least
References Attarchi, S., Gloaguen, R., 2014. Classifying complex mountainous forests with L-band SAR and landsat data integration: a comparison among different machine learning methods in the Hyrcanian forest. Rem. Sens. 6, 3624–3647. Bao, Q., Zhang, T., Zhang, X.D., et al., 2013. Application of logging lithofacies identification technology in block A of the right bank of the Amu- Darya river. Nat. Gas. Ind. 33 (11), 51–55. Behrens, Christoph, Pierdzioch, Christian, Risse, Marian, 2018. Testing the optimality of inflation forecasts under flexible loss with random forests. Econ. Modell. 72, 270–277. https://doi.org/10.1016/j.econmod.2018.02.004. Breiman, L., 1996. Bagging predictors. Mach. Learn. 24, 123–140. Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. Chang, Y.W., Hsieh, C.J., Chang, K.W., et al., 2010. Training and testing low-degree polynomial data mappings via linear SVM. J. Mach. Learn. Res. 11 (11), 1471–1490. Cracknell, M.J., Reading, A.M., 2014. Geological mapping using remote sensing data: a comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Comput. Geosci. 63, 22–33. https://doi.org/10.1016/j.cageo.2013.10.008. Dai, J., Li, J., Luo, X., et al., 2005. Stable carbon isotope compositions and source rock geochemistry of the giant gas accumulations in the Ordos Basin, China. Org. Geochem. 36, 1617–1635. Dong, Shaoqun, Wang, Zhizhang, Zeng, Lianbo, 2016. Lithology identification using kernel Fisher discriminant analysis with well logs. J. Petrol. Sci. Eng. 143, 95–102.
340
Journal of Petroleum Science and Engineering 176 (2019) 321–341
J. Sun et al. square support vector machine. Nat. Gas Explor. Dev. 32 (03), 15–18+72. Li, J., Luo, X., Shan, X., 2005. Natural gas accumulation in the upper paleozoic of Ordos Basin, China. Petrol. Explor. Dev. 32 (4), 54–59. Li, C., Wang, J., Wang, L., Hu, L., Gong, P., 2014. Comparison of classification algorithms and training sample sizes in urban land classification with Landsat Thematic Mapper Imagery. Rem. Sens. 6, 964–983. Liu, H., Wen, S., Li, W., Xu, C., Hu, C., 2009. Study on identification of oil/gas and water zones in geological logging base on support-vector machine. In: Fuzzy Information and Engineering Volume 2. Advances in Intelligent and Soft Computing, vol. 62 Springer, Berlin, Heidelberg. Lu, Xiaoling, Dong, Fengchi, Liu, Xiexin, et al., 2018. Varying coefficient support vector machines. Stat. Probab. Lett. 132, 107–115. https://doi.org/10.1016/j.spl.2017.09. 006. Mohammadi, Neda Mahvash, Hezarkhani, Ardeshir, 2018. Application of support vector machine for the separation of mineralised zones in the Takht-e-Gonbad porphyry deposit, SE Iran. J. Afr. Earth Sci. 143, 301–308. https://doi.org/10.1016/j.jafrearsci. 2018.02.005. Othman, Arsalan A., Gloaguen, Richard, 2017. Integration of spectral, spatial and morphometric data into lithological mapping: a comparison of different Machine Learning Algorithms in the Kurdistan Region, NE Iraq. J. Asian Earth Sci. 146, 90–102. https://doi.org/10.1016/j.jseaes.2017.05.005. Partopour, Behnam, Paffenroth, Randy C., Dixon, Anthony G., 2018. Random Forests for mapping and analysis of microkinetics models. Comput. Chem. Eng. https://doi.org/ 10.1016/j.compchemeng.2018.04.019. Pedregosa, F., Varoquaux, G., Gramfort, A., et al., 2011. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12 (10), 2825–2830. Raeesi, Morteza, Moradzadeh, Ali, Ardejani, Faramarz Doulati, et al., 2012. Classification and identification of hydrocarbon reservoir lithofacies and their heterogeneity using seismic attributes, logs data and artificial neural networks. J. Petrol. Sci. Eng. 82–83, 151–165. https://doi.org/10.1016/j.petrol.2012.01.012. Rafik, Baouche, Kamel, Baddari, 2017. Prediction of permeability and porosity from well log data using the nonparametric regression with multivariate analysis and neural network, Hassi R'Mel Field, Algeria. Egypt. J. Petrol. 26, 763–778. https://doi.org/ 10.1016/j.ejpe.2016.10.013. Rodriguez, J.D., Perez, A., Lozano, J.A., 2010. Sensitivity analysisofk-fold crossvalidation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 32 (3), 569–575. https://doi.org/10.1109/TPAMI.2009.187. Salehi, S.M., Honarvar, B., 2014. Automatic identification of formation lithology from well log data: a machine learning approach. J. Petrol. Sci. Res. 3 (2), 73–82. Sebtosheikh, Mohammad Ali, Salehi, Ali, 2015. Lithology prediction by support vector classifiers using inverted seismic attributes data and petrophysical logs as a new approach and investigation of training data set size effect on its performance in a heterogeneous carbonate reservoir. J. Petrol. Sci. Eng. 134, 143–149. https://doi. org/10.1016/j.petrol.2015.08.001. She, G., Ma, L.J., Xu, Y.F., et al., 2015. Reservoir characteristics of oil sands and logging
evaluation methods: a case study from Ganchaigou area, Qaidam Basin. Lithologic Rservoirs 27 (6), 119–124. Shokooh Saljooghi, B., Hezarkhani, A., 2015. A new approach to improve permeability prediction of petroleum reservoirs using neural network adaptive wavelet (wavenet). J. Petrol. Sci. Eng. 133, 851–861. https://doi.org/10.1016/j.petrol.2015.04.002. Song, Yanjie, Zhang, Jianfeng, Yan, Weilin, et al., 2007. A new identification method for complex lithology with support vector machine. J. Daqing Pet. Inst. 31 (5), 18–20. Sun, Fengrui, Yao, Yuedong, Chen, Mingqiang, Li, Xiangfang, Zhao, Lin, Meng, Ye, Sun, Zheng, Zhang, Tao, Feng, Dong, 2017. Performance analysis of superheated steam injection for heavy oil recovery and modeling of wellbore heat efficiency. Energy 125, 795–804. Timm, B.C., McGarigal, K., 2012. Fine-scale remotely-sensed cover mapping of coastal dune and salt marsh ecosystems at Cape Cod National Seashore using Random Forests. Remote Sens. Environ. 127, 106–117. https://doi.org/10.1016/j.rse.2012. 08.033. Vigneau, E., Courcoux, P., Symoneaux, R., et al., 2018. Random forests: a machine learning methodology to highlight the volatile organic compounds involved in olfactory perception. Food Qual. Prefer. 68, 135–145. https://doi.org/10.1016/j. foodqual.2018.02.008. Witten, I.H., Frank, E., Hall, M.A., 2011. Data Mining: Practical Machine Learning Tools and Techniques. Elsevier, pp. 630. Xie, Yunxin, Zhu, Chenyang, Wen, Zhou, et al., 2018. Evaluation of machine learning methods for formation lithology identification: a comparison of tuning processes and model performances. J. Petrol. Sci. Eng. 160, 182–193. https://doi.org/10.1016/j. petrol.2017.10.028. Yang, H., Fu, J., Liu, X., et al., 2012. Accumulation conditions and exploration and development of tight gas in the Upper Paleozoic of the Ordos Basin. Petrol. Explor. Dev. 39, 315–324. Zhang, D.Q., Zou, N.N., Jiang, Y., et al., 2015. Logging identification method of volcanic rock lithology: a case study from volcanic rock in Junggar Basin. Lithologic Rservoirs 27 (1), 108–114. Zhang, Xiekai, Ding, Shifei, Xue, Yu, 2017. An improved multiple birth support vector machine for pattern classification. Neurocomputing 225, 119–128. https://doi.org/ 10.1016/j.neucom.2016.11.006.3. Zhao, L., Yang, J., Li, P., Zhang, L., 2014. Seasonal inundation monitoring and vegetation pattern mapping of the Erguna floodplain by means of a RADARSAT-2 fully polarimetric time series. Remote Sens. Environ. 152, 426–440. https://doi.org/10.1016/j. rse.2014.06.026. Zhong, Yihua, Rong, Li, 2009. Application of principal component analysis and least square support vector machine to Lithology identification. Well Logging Technol. 33 (05), 425–429. Zoppis, Italo, Mauri, Giancarlo, Dondi, Riccardo, 2018. Kernel methods: support vector machines. Ref. Modul. Life Sci. https://doi.org/10.1016/B978-0-12-809633-8. 20342-7.
341