Journal Pre-proof Research and application of logging lithology identification for igneous reservoirs based on deep learning
Xiang Min, Qin Pengbo, Zhang Fengwei PII:
S0926-9851(18)30939-X
DOI:
https://doi.org/10.1016/j.jappgeo.2019.103929
Reference:
APPGEO 103929
To appear in:
Journal of Applied Geophysics
Received date:
1 November 2018
Revised date:
10 October 2019
Accepted date:
25 December 2019
Please cite this article as: X. Min, Q. Pengbo and Z. Fengwei, Research and application of logging lithology identification for igneous reservoirs based on deep learning, Journal of Applied Geophysics(2019), https://doi.org/10.1016/j.jappgeo.2019.103929
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Published by Elsevier.
Journal Pre-proof Research and application of logging lithology identification for igneous reservoirs based on deep learning Xiang Min1*, Qin Pengbo2, Zhang Fengwei1 1 College of mining engineering and geology, Xinjiang Institute of Engineering, Urumchi, 830001, China 2 Guangzhou Marine Geological Survey, Guangzhou, 510000, China
of
* corresponding author Xiang Min Xinjiang Institute of Engineering Room 301, Building 6, No. 236, Nanchang Road, Saybagh District, Urumqi, Xinjiang, China (830001) +86 15160909865 +86 09917977188
[email protected]
Jo ur
na
lP
re
-p
ro
Abstract Igneous reservoirs are characterized by heterogeneity and anisotropy, which makes logging interpretation difficult. In order to identify their lithology, deep learning to establish the deep belief network (DBN) by logging data is proposed in this study. By the least square method, the mean square error function is used to measure network performance, and network parameters such as the number of RBMs, the number of neurons in the hidden layer of each RBM, and the classification boundary. Then, the logging data that require interpretation are processed by DBN that is trained. The results are divided into four cases and are analyzed and discussed further. First, the lithology classification results are continuous, stable and the formations are thick. At this point, there is no need to correct the results. Second, there are several lithological discontinuities in the thick layer. In this case, if the thickness of the discontinuous formations is greater than 0.5 m, the corresponding formations can be divided according to the identification results; if the thickness of the discontinuous formations are less than or equal to 0.5 m, the discontinuous formations are merged into adjacent thick formations. Third, the lithology of formations cannot be determined by the identification result. At this time, it is generally considered that the lithology of the formations do not appear in the training samples. Fourth, there are a few identification results for one formation. At this point, a cross plot is adopted to correct these results. An accuracy of 94.8% is achieved for lithology identification by the deep belief network with lithology correction. Key words Deep belief network; Igneous rocks; Logging response; Lithology identification 1 introduction As the consumption of fossil fuels has increased, research on petroleum exploration and development has focused on igneous reservoirs, which have become extremely popular in oil and gas exploration. Igneous rocks vary greatly because of their complex diagenesis. The minerals in these rocks differ widely, and their textures and structures are complex. Moreover, igneous reservoirs exhibit unevenly distributed and widely varying porosities, which are typically characterized by heterogeneity and anisotropy. Accurate identification of lithology is the primary objective of interpretation of igneous reservoir logging. Sanyal et al. (1979) analyzed the logging response characteristics of igneous rocks, and constructed a lithology identification histogram and cross plot. Khatchikian (1982) identified igneous lithology using the M-N cross plot. Zhou et al. (1998) applied a back propagation neural network (BPNN) combined with a genetic algorithm in logging interpretation for lithology
Journal Pre-proof
Jo ur
na
lP
re
-p
ro
of
identification. Zhang et al. (2012) established a logging response equation based on acoustic waves, neutron, density, natural gamma, and contents of aluminum, silicon, calcium, and iron by combining Elemental Capture Spectroscopy (ECS) logging with conventional logging, and then identified the lithology of volcanic rocks. Wang et al. (2015) proposed a lithology identification pattern of igneous rocks combining conventional logging, imaging logging, ECS logging, and core calibration. Zhang et al. (2015) proposed the use of a GR-Rt /AC cross plot to distinguish between volcanic and sedimentary rocks, followed by the use of GR-DEN and GR-AC cross plots to determine the types of volcanic rocks, and finally, the use of imaging logging (FMI) to analyze the textures and structures. Yu (2017) selected five logging curves that were sensitive to lithology (i.e., acoustic, density, neutron, natural gamma, and resistivity curves) to build a database, and then distinguished between sedimentary and igneous rocks using the Fisher discriminant method. Zhang (2011), Xu (2011), Mou (2015), Li (2016), Ye (2017), Zhang (2017), and Han (2018) also conducted corresponding research and discussion on lithology identification of igneous reservoirs. In general, the most effective methods for lithology identification of igneous reservoirs are ECS logging and imaging logging. However, both methods are expensive and cannot be carried out on a large scale. In the case of conventional logging curves only, the common interpretation methods are cross plot, multivariate statistical analysis, and BPNNs. The cross plot is a simple and easy interpretation method, but because of the complexity of igneous reservoirs, the use of only the cross plot cannot achieve ideal results. Multivariate statistical analysis is an efficient method, but there are many parameters that need to be adjusted to solve the complex nonlinear problem of igneous rock identification, which can easily result in significant errors. The BPNN is widely a type of "shallow" neural network. It suffers from the disadvantage of local minimization and often fails to obtain the global optimal solution. Moreover, it has the shortcoming of slow convergence. Since entering the 21st century, artificial intelligence technology has become the research hotspots in many fields, which has also been applied in logging interpretation (Abdulaziz et al., 2018; Camila et al., 2018; Guoyin et al., 2018). In recent years, the continuous development of artificial intelligence technology based on the development of the traditional shallow neural network has given rise to deep learning (Hinton, 2012), which has undergone continuous development. Deep learning is capable of discovering and characterizing the complex structural characteristics of a problem (Krizhevsky et al., 2012; Lecun et al., 2015; Silver et al., 2016; Guo et al., 2016). Although deep learning has been successfully used in many disciplines to date (Deng et al., 2014; Alipanahi et al., 2015; Zhou et al., 2015; Noda et al., 2015; Liu et al., 2015; Tomczak et al., 2017), it is rarely used in the field of logging. In this study, deep belief network (DBN), a common method of deep learning, is introduced into igneous reservoir logging data processing, and a method is proposed to determine the quantities of restricted Boltzmann machines, the quantities of the hidden layers’ neurons of each restricted Boltzmann machine and the boundary of classification. Finally, according to the results of various calculations, the corresponding lithological correction scheme is applied, and accurate lithological identification is achieved. 1 Back-propagation neural network
-p
ro
of
Journal Pre-proof
Jo ur
na
lP
re
Fig. 1. Structure of a BPNN Proposed by Rumelhart and McClelland in 1986, the BPNN is a multilayer feedforward neural network (MLFNN) based on the error back propagation algorithm. It is widely used in function approximation, pattern recognition, classification, data compression, data mining, and other disciplines. The BPNN can be divided into three layers: the input layer, hidden layer, and output layer. There is only one input layer and one output layer, whereas there may be one or several hidden layers. Fig. 1 depicts the structure of a BPNN. Each layer contains several neurons. The neurons in the same layer are not connected to each other, whereas the neurons in different layers are connected by weight wi. The input layer receives the data, and the hidden and output layers process the data. The data processing model is usually a sigmoid function: n
yin xi wi b i 1
yout f yin
1 1 e yin
(1)
(2)
where yin is the input of a certain layer, xi denotes the output of the previous layer, and yout is the output of the layer. In addition, b is the threshold; if the input is greater than b, it will change the output, whereas the output will not be affected if the input is less than b. Through the input and output of the network, the final output vector can be obtained. The final output vector is compared with the desired result, and if the error satisfies the requirements, the calculation will be stopped. If the error does not meet the requirement, the error will be conveyed from the output layer to other layers in some way, and distributed in each layer. Then the weight wi will be corrected, and the above input and output operations will be repeated until the error is satisfactory. Therefore, the basic principle of BPNN is that the neural network constantly changes the weights under the stimulation of external input samples, thus causing the outputs to gradually approach the desired result. BPNN is a typical "shallow" neural network. When more than two hidden layers are present, the
Journal Pre-proof
re
-p
ro
of
calculation results become very unsatisfactory. 3 Deep Belief Network The Deep Belief Network, a type of deep learning, was proposed by Geoffrey Hinton, the ‘father of the neural network’, in 2006. Deep belief network is a "deep" neural network developed on the basis of BPNN. It consists of some restricted Boltzmann machines (RBMs). RBM, an unsupervised learning method, is the key to operating deep belief networks. As shown in Figure 3, a restricted Boltzmann machine consists of two layers, a visual layer, which is mainly used to input training data, and the other is the hidden layer, which is a feature detector that is used to process the data.
Jo ur
na
lP
Fig. 2. Structure of an RBM For an RBM, if the input vector is Fip, we first calculate the incentive value of the neurons in the hidden layer: (3) h WP where W is a matrix composed of many weights wi. We then calculate the probability that the hidden layer neurons will be turned on (represented by P1) by the sigmoid function: P1
1 1 e hm
(4)
where hm h 。
Thus, the probability of the hidden layer neurons that will be turned off (represented by P0) is: (5) P0 1 P1 Finally, we compare the probability that the hidden layer neurons will be turned on with a random value u (u∈[0,1]) from the uniform distribution by the function is as follows:
1 , P hm 1 u hm 0 , P hm 1 u
(6)
The above procedure will determine whether each hidden layer neuron will be turned on or not. Thus, the first restricted Boltzmann machine is trained completely. The state of the hidden layer neurons (on or off) can be expressed as the matrix G1, which is used as the input of the second restricted Boltzmann machine. After repeating the above training process (like the first restricted Boltzmann machine), the training of the second restricted Boltzmann machine will be carried out. The final output result after training p restricted Boltzmann machines is Gp. According to the mode of BPNN training (the second part of this paper), the Gp is compared
Journal Pre-proof
Jo ur
na
lP
re
-p
ro
of
with the desired result, and the error is distributed in each RBM. Then, the weight matrix W is corrected until the error meets the demand. Thus, a DBN is established. Therefore, DBN is essentially a "deep" neural network trained by a series of restricted Boltzmann machines with the error back propagation algorithm. 4 Examples of lithology identification based on deep belief network The logging data in this paper come from a block in the east of the Junggar Basin. From drilling data, the formations in this block are found to be mainly comprise igneous and sedimentary rocks. The sedimentary rocks are mainly sandstone and mudstone, and the igneous rocks include basalt, trachyte, andesite, diabase, and gabbro. Considering the actual situation, the feasibility of logging identification, and the need for reservoir research, the basalt can be further divided into dense basalt and non-dense basalt, and the trachyte can be divided into dense trachyte and non-dense trachyte. Logging response characteristics of igneous reservoirs are more complex than sedimentary reservoirs and traditional methods can therefore not be used for lithology identification. To address this, this paper introduces a deep belief network, a method of deep learning, for igneous reservoir logging interpretation. After the network structure is determined by the optimization algorithm, the lithology can be identified more accurately. 4.1 Normalization of logging data In order to train the DBN, logging data whose corresponding formations’ lithology are already known are required. The data is categorized into a sample set and test set. The sample set is used to train the DBN, including 1000 sets of logging data of typical sedimentary rocks, dense basalt, non-dense basalt, dense trachyte, non-dense trachyte, andesite, diabase and gabbro respectively in the area. In addition, 500 sets of logging data with the same lithologies are used to test network performance. The physical principles of each logging curve varies, and the dimensions and orders of magnitude of the measured physical parameters also vary greatly. However, the computer cannot recognize this and it is therefore necessary to preprocess all logging data to make their dimensions and numerical distribution ranges uniform. Normalization is generally adopted: x xmin (7) xnor xmax xmin where, xnor is normalized data, x is the original logging data, xmax and xmin are the maximum and minimum of the logging data, respectively. After normalizing, the logging data will be dimensionless and distributed in the interval [0,1]. 4.2 Digitalization of lithology The computer cannot recognize all eight lithologies in this paper. Therefore, lithology must be digitalized to satisfy the computer's reading. In this paper, lithology is recorded as matrix Y, which is composed of eight groups of vectors. The rows of Y represent the different depths, and the columns of Y represent the eight kinds of lithology. Y’s elements are 0 or 1, which are used to judge the lithology. A certain kind of lithology is represented by 1, and 0 indicates the absence of this lithology. Table 1. digital method of lithology classification Depth
Sedimentary rock
Dense basalt
Non-dense basalt
Dense trachyte
Non-dense trachyte
Andesite
Diabase
Gabbro
Identification
⋮ 2208.50m
⋮ 1
⋮ 0
⋮ 0
⋮ 0
⋮ 0
⋮ 0
⋮ 0
⋮ 0
⋮ Sedimentary
Journal Pre-proof
2208.75m
1
0
0
0
0
0
0
0
2209.00m
1
0
0
0
0
0
0
0
2209.25m
1
0
0
0
0
0
0
0
2209.50m 2209.75m 2210.00m ⋮
0 0 0 ⋮
1 1 1 ⋮
0 0 0 ⋮
0 0 0 ⋮
0 0 0 ⋮
0 0 0 ⋮
0 0 0 ⋮
0 0 0 ⋮
rock Sedimentary rock Sedimentary rock Sedimentary rock Dense basalt Dense basalt Dense basalt ⋮
0 0 0 0 0 0 0
0 0 0 0 0 0 0
ro
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
-p
0 0 0 0 0 0 0
re
0 0 0 0 1 1 1
lP
1 1 1 Y= 1 0 0 0
of
The formations lithology of different depths in a well can be represented as Table 1. From 2208.50m to 2209.25m, the formations are composed of sedimentary rocks, and from 2209.25m to 2210.00 m, the formations comprise dense basalts. Matrix Y can be denoted as:
Jo ur
na
4.3 Grey correlation analysis In order to optimize the performance of DBN, it is necessary to select suitable logging data to train the network. In this paper, the relationship between logging data and lithology is reflected by grey correlation analysis. Ten logging curves, including deep lateral, shallow lateral, mmicrosphere focusing, spontaneous potential, natural gamma, calliper, acoustic, density, compensated neutrons and photoelectric cross-section, are written as Xi (h)=[x1(h), x2(h), …, x10(h)], where h represents depth, i=1, 2,... 10. The lithology corresponding to logging data is denoted as matrix Y, and Y=[y1(h), y2(h), …, y8(h)]. Then, the grey relational grade between each logging curve and different kinds of lithology is expressed as: max Y (k ) xi (k ) 1 n min min Y (k ) xi (k ) max i k (Y , X i ) i k (8) n k 1 Y (k ) xi (k ) max max Y (k ) xi (k ) i
k
where γ(Y,Xi) is the grey relational grade; ξ is a resolution coefficient which is a constant, usually ξ=0.5. Table 2 the grey relational grades Sedimentary rock Dense basalt Non-dense Basalt
RLLD
RLLS
RXO
SP
GR
CAL
DEN
AC
CNL
PEFL
0.82
0.81
0.78
0.74
0.80
0.76
0.81
0.81
0.80
0.82
0.83
0.80
0.82
0.75
0.82
0.72
0.78
0.83
0.79
0.80
0.82
0.81
0.80
0.76
0.81
0.84
0.84
0.86
0.85
0.81
Journal Pre-proof Dense trachyte Non-dense trachyte Andesite Diabase Gabbro Geometric mean value
0.80
0.78
0.80
0.74
0.82
0.72
0.80
0.79
0.79
0.80
0.82
0.78
0.79
0.71
0.81
0.82
0.84
0.82
0.82
0.83
0.81 0.78 0.79
0.77 0.77 0.78
0.79 0.79 0.78
0.70 0.65 0.66
0.82 0.82 0.80
0.76 0.73 0.70
0.80 0.76 0.79
0.82 0.79 0.79
0.82 0.80 0.81
0.80 0.80 0.81
0.80
0.78
0.79
0.71
0.81
0.75
0.80
0.81
0.81
0.81
Jo ur
na
lP
re
-p
ro
of
The grey relational grades between 10 logging curves and 8 kinds of lithology can be calculated as shown in Table 2, where the relationship between logging curves and lithology can be quantified. The correlation between the two is stronger when the grades are greater. For logging interpretation, the appropriate grey relational grade is generally greater than 0.8 between the logging curve and lithology. However, in practice, each logging curve has a different effect on different kinds of lithology, making it difficult to obtain enough logging curves whose grey relational grades are greater than 0.8 for all kinds of lithologies. Therefore, through experimentation, logging curves that satisfy the following two conditions to train DBN were selected carefully: (1) for each kind of lithology, the grey relational grade is greater than 0.7. (2) the geometric mean of the grey correlation degree corresponding to all kinds of lithologies is greater than 0.8. Accordingly, RLLD, GR, DEN, AC, CNL and PEFL are selected as the input parameters of DBN. 4.4 DBN’s training Using the sample set to construct the DBN, the lithology identification can be transformed into the mathematical calculation. The inputs of DBN are six vectors, namely normalized RLLD, GR, DEN, AC, CNL and PEFL logging curves. The output is a matrix comprising eight vectors whose elements are 0 or 1. The eight vectors represent sedimentary rocks, dense basalt, non-dense basalt, dense trachyte, non-dense trachyte, andesite, diabase, and gabbro. According to the digitalization of lithology given in section 4.2; for example, if the formation at a certain depth is made up of sedimentary rocks, the element of the vector corresponding to the sedimentary rocks is 1, and the elements of the vectors corresponding to other rocks are 0. After DBN’s training is completed, the network performance can be tested with the test set to adjust the network parameters. 4.5 The number of RBMs. Increasing the number of RBMs improves the performance of DBN nonlinearly, which shows "rapidly first and then slowly", while also increasing the operation time. Therefore, we should select the appropriate number of RBMs, taking into account the training effect and computation speed. From experience, RBMs are usually limited to 5 to 30. The number of RBMs is used as a variable to train the DBN. The normalized logging data of the test set are used as input to obtain the matrix representing eight kinds of lithology. The output is not an ideal result; that is, the matrix does not consist of 0 and 1, but the numbers in the interval of [0,1]. From the basic idea of least squares, the output matrix is recorded as Y1. The theoretical lithology classification result is
Journal Pre-proof
lP
re
-p
ro
of
digitized as given in Part 4.2, and the matrix is written as Y0. The network performance is measured by the mean square error function, i.e. the quadratic sum of all the elements in △Y1 (△Y1= Y1-Y0), written as MSE△Y1. Finally, the relationship between MSE△Y1 and the number of RBMs is obtained. In order to account for the training effect and calculation speed, as can be seen in Fig 3, when there are more than 10 RBMs, MSE△Y1 does not decrease significantly. Therefore, 10 RBMs is appropriate for this paper.
Jo ur
na
Fig 3. the relationship between the number of RBMs and MSE△Y1 4.6 The number of neurons in the hidden layer Each RBM consists of a visible layer and a hidden layer. The number of neurons in the visual layer is relatively easy to determine. The number of neurons in the visible layer of first RBM is the same as the number of input logging curves (6), and the number of neurons in the visible layer of other RBMs is equal to the number of output vectors from the previous RBM. The number of neurons in the hidden layer is closely related to network performance and is not easy to determine. If the number of hidden layer neurons is too small, the relationship between logging curves and lithology cannot be well extracted, and "under-fitting" appears. If the number of hidden layer neurons is too large, the relationship between logging curves and lithology in the sample set will be excessively extracted such that the trained DBN will be only applicable for sample sets, and will not be good for other logging data, i.e. "over-fitting" occurs. Presently, few good methods of determining the number of neurons in the hidden layer exist. This paper proposes the following method. It is known that the number of neurons in the hidden layer is usually one order of magnitude less than the number of samples. There are 8000 samples in this paper; accordingly the number of neurons in the hidden layer should be between 100~999. Hypothetically, the number of neurons in the hidden layer in 10 RBMs is n1,n2,n3,… n10, written as nm, and 100≤nm≤999. The output matrix is considered as matrix Y2, and △Y2= Y2 -Y0. The quadratic sum of all elements in △Y2 is MSE△Y2. The possible values of nm are limited to 100, 200, 300, 400, 500, 600, 700, 800, 900, and
Journal Pre-proof
na
lP
re
-p
ro
of
999. When MSE△Y2 is minimum, after calculation, the corresponding nm are n1a,n2a,n3a,… n10a, written as nma. The possible values of nm are reset to nma-50、nma-40、nma-30、nma-20、nma-10、 nma、nma+10、nma+20、nma+30、nma+40、nma+50. When MSE△Y2 is minimum, after calculation, the corresponding nm are n1b,n2b,n3b,… n10b, written as nmb. Finally, the possible values of nm are reset to nmb-5、nmb-4、nmb-3、nmb-2、nmb-1、nmb、nmb+1、nmb+2、nmb+3、nmb+4、nmb+5. When MSE△Y2 is minimum, the corresponding nm are the ideal number of neurons in the hidden layer. By following the above process, the appropriate number of neurons in the hidden layer of 10 RBMs are 708、633、629、654、512、551、542、432、378、210 in this study. 4.7 The classification boundary The output of logging data processed by the trained DBN is usually comprised of the numbers in the interval of [0,1], not the integer 0 or 1. These numbers should ideally be rounded off; that is, 0.5 is the lithology classification boundary. The number greater than or equal to 0.5 is approximately 1, and less than 0.5 is approximately 0. However, because the relationship between logging curves and lithology is very complex, errors such as lithology identification errors, classification overlap (belong to one lithology, but also belong to another lithology) and no classification (not belong to any lithology) are bound to arise. In this paper, the classification boundary w∈[0.3,0.7], the output matrix is considered as matrix Y3, and the theoretical lithology classification result is Y0. In Y3, the number greater than or equal to w is approximately 1, and less than w is approximately 0. Setting △Y3= Y3 -Y0, when the quadratic sum of all the elements in △Y3 is minimum, the corresponding classification boundary is most appropriate. In this paper, w=0.558. 4.8 The calibration of identification results After the parameters are determined by the above process, DBN is applied to the lithology identification of the MXX well in this block. The results are analyzed and discussed below. (1) The identification results are continuous and stable, and the formation is thick. Table 3. Lithology identification from 2068.50 m to 2080.00 m Dense Non-dense basalt basalt y2(h) y3(h)
Dense trachyte y4(h)
Non-dense trachyte y5(h)
Andesite y6(h)
Diabase y7(h)
Gabbro y8(h)
⋮
⋮
⋮
⋮
⋮
⋮
0
0
0
0
0
0
⋮
⋮
⋮
2068.50m
1
0
2068.75m
1
0
0
0
0
0
0
0
2069.00m
1
0
0
0
0
0
0
0
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
2079.50m
1
0
0
0
0
0
0
0
2079.75m
1
0
0
0
0
0
0
0
2080.00m
1
0
0
0
0
0
0
0
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
Jo ur
Depth
Sedimentary rock y1(h)
Computation
Correction
⋮ sedimentary rock sedimentary rock sedimentary rock ⋮ sedimentary rock sedimentary rock sedimentary rock ⋮
⋮ / / / ⋮ / / / ⋮
Journal Pre-proof y1(h), y2(h), …, y8(h) represents the computation results of different lithology, and h represents depth. The lithology identification between MXX well 2068.50 m and 2080.00 m is shown in Table 3. y1(2068.50) to y1(2080.00) are 1, while y2(2068.50), …, y8(2068.50) to y2(2080.00), …, y8(2080.00) are all 0. The calculation is continuous and stable, indicating that the formation is mostly made up of sedimentary rock. From comparisons with the results of the mudlog data, the two are found to coincide. (2) At a certain depth, the thick formation is mixed with discontinuous formation. Table 4. Lithology identification from 3246.00 m to 3261.50 m Andesite y6(h)
Diabase y7(h)
Gabbro y8(h)
Computation
Correction
⋮
⋮
⋮
⋮ Dense basalt Dense basalt ⋮ Dense basalt Dense basalt Dense basalt Dense basalt Dense basalt Dense basalt Dense basalt ⋮ Dense basalt Dense basalt ⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
3246.00m
0
1
0
0
0
0
0
0
Dense basalt
3246.25m
0
1
0
0
0
0
0
0
Dense basalt
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
3255.50m
0
1
0
0
0
0
0
0
Dense basalt
3255.75m
0
1
0
0
0
0
Dense basalt
3256.00m
0
0
1
3256.25m
0
1
0
3256.50m
0
0
1
3256.75m
0
1
3257.00m
0
1
⋮
⋮
⋮
3260.25m
0
3261.50m ⋮
re
Depth
of
Non-dense trachyte y5(h)
ro
Dense trachyte y4(h)
-p
Dense Non-dense basalt basalt y2(h) y3(h)
Sedimentary rock y1(h)
0
0
0
0
0
0
Non-dense basalt
0
0
0
0
0
Dense basalt
0
0
0
0
0
Non-dense basalt
Jo ur
na
lP
0
0
0
0
0
0
0
Dense basalt
0
0
0
0
0
0
Dense basalt
⋮
⋮
⋮
⋮
⋮
⋮
⋮
1
0
0
0
0
0
0
Dense basalt
0
1
0
0
0
0
0
0
Dense basalt
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
The lithology identification from 3246.00m to 3261.50m in MXX well is shown in Table 4. The formations are made up of basalt. Between 3255.75 m and 3256.75 m, the dense basalts are interbedded with thin non-dense basalts. Considering the resolution of conventional logging, in this paper, the formation with thickness less than 0.5 m is merged with the adjacent thick formation. Therefore, in MXX well, the formation from 3246.00 m to 3261.50 m can be divided into dense basalts. (3) Lithology cannot be identified.
Journal Pre-proof Table 5. Lithology identification from 3012.00m to 3019.50m
⋮ 0 0 ⋮ 0 0 0 0 0 ⋮ 0 0 ⋮
⋮ 0 0 ⋮ 0 0 0 0 0 ⋮ 0 0 ⋮
Non-dense trachyte y5(h)
Andesite y6(h)
Diabase y7(h)
Gabbro y8(h)
Computation
Correction
⋮ 0 0 ⋮ 0 0 0 0 0 ⋮ 0 0 ⋮
⋮ 0 0 ⋮ 0 0 0 0 0 ⋮ 0 0 ⋮
⋮ 0 0 ⋮ 0 0 0 0 0 ⋮ 0 0 ⋮
⋮ 0 0 ⋮ 0 0 1 0 0 ⋮ 0 0 ⋮
⋮ 0 0 ⋮ 0 0 0 0 0 ⋮ 0 0 ⋮
⋮ Null Null ⋮ Null Null Diabase Null Null ⋮ Null Null ⋮
⋮ Other Other ⋮ Other Other Other Other Other ⋮ Other Other ⋮
of
⋮ 0 0 ⋮ 0 0 0 0 0 ⋮ 0 0 ⋮
Dense trachyte y4(h)
ro
⋮ 3012.00m 3012.25m ⋮ 3016.75m 3017.00m 3017.25m 3017.50m 3017.75m ⋮ 3019.25m 3019.50m ⋮
Dense Non-dense basalt basalt y2(h) y3(h)
-p
Depth
Sedimentary rock y1(h)
Jo ur
na
lP
re
The lithology identification from 3012.00 m to 3019.50 m is shown in Table 5. except y7(3017.25), the elements of y1, y2, …, y8 are 0, which means that the lithology cannot be identified. The formation of 3017.25 m is identified as diabase, but it is too thin to be classified as a separate diabase formation and should be merged into the unclassifiable formation. According to mudlog data, the formation is composed of diorite. Because there are few diorite formations, in order to ensure the performance of DBN, the sample set does not contain diorite, making it impossible to identify diorite formation. In addition to diorite, there is still a small amount of coals that cannot be identified in the MXX well. (4) Lithology classification is overlapped. Table 5. Lithology identification from 4111.00m to 4117.25m Depth
Sedimentary rock y1(h)
Dense basalt y2(h)
Non-dense basalt y3(h)
Dense trachyte y4(h)
Non-dense trachyte y5(h)
Andesite y6(h)
Diabase y7(h)
Gabbro y8(h)
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
4111.00m
0
0
0
0
1
0
0
0
4111.25m
0
0
0
0
1
0
0
0
4113.75m
0
0
0
1
1
0
0
0
overlap
4114.00m
0
0
0
1
1
0
0
0
overlap
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
4116.50m
0
0
0
1
1
0
0
0
overlap
4116.75m
0
0
0
1
1
0
0
0
overlap
Computation
Correction
⋮ Non-dense trachyte Non-dense trachyte
⋮ Non-dense trachyte Non-dense trachyte Non-dense trachyte Non-dense trachyte ⋮ Non-dense trachyte Non-dense trachyte
Journal Pre-proof
4117.00m
0
0
0
0
1
0
0
0
4117.25m
0
0
0
0
1
0
0
0
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
⋮
Non-dense trachyte Non-dense trachyte ⋮
lP
re
-p
ro
of
The lithology identification from 4111.00 m to 4117.25 m is shown in Table 6. Between 4113.75 m and 4116.75 m, the elements of the vectors y5 representing the dense trachyte and y6 representing the non-dense trachyte are both 1, which leads to classification overlap. The formation from 4113.75 m to 4116.75 m includes both dense trachyte and non-dense trachyte. Accordingly, a cross plot is adopted to correct the results.
Jo ur
na
Fig 4. AC-GR and GR-RLLD cross plot By crossing each logging curve, dense trachyte and non-dense trachyte can be well distinguished using the AC-GR and GR-RLLD cross plot. The formation from 4113.75 m to 4116.75 m in MXX well can be re-determined as non-dense trachytes. In addition, a small amount of gabbro and diabase also needs to be corrected by the cross plot. 4.9 Classification accuracy analysis. Table 7. Comparison of DBN and BNN
Accuracy
DBN The number of samples
BPNN The number of samples
100
1000
100
1000
78.5%
86.1%
78.1%
80.1%
The lithology identification results of DBN and BPNN are compared in table 7. By contrast, we can see that the performance of DBN is related to the number of samples. When the sample is small (100), the identification accuracies of DBN and BPNN are almost equal. When the sample is large (1000), compared with BPNN, the performance of DBN improves greatly. Table 8. Comparison of lithology identification before and after correction Accuracy
no correction
correction
86.1%
94.8%
Table 8 shows the comparison of lithology identification before and after correction. Some corrections that could further improve the identification accuracy are proposed in section 4.8. However, some errors still remain. These errors mainly concentrate on trachyte and non-dense
Non-dense trachyte Non-dense trachyte ⋮
Journal Pre-proof trachyte or dense basalt and non-dense basalt formation. The several reasons this could be attributable to are that the sample set alone cannot fully reflect the characteristics of all formations; network parameters, especially the number of neurons in the hidden layer of RBMs, may be further optimized; and the difference between dense and non-dense rocks is not strictly defined.
5 Conclusions
6 Acknowledgements
lP
re
-p
ro
of
In this study, DBN is applied to identify the lithology using logging data. Our findings are as follows. First, when there are no ECS logging and imaging logging data, it is very feasible to train DBN to identify lithology by conventional logging curves. Second, depending on AC, DEN, and CNL logging curves, DBN can identify lithology, as well as judge the compactness of key formation. Third, using the MSE function to measure the performance of DBN, the number of RBMs, the number of neurons in the hidden layer of each RBM and the classification boundary are determined. Fourth, in order to improve the accuracy of lithology identification results, they are divided into four categories. If the lithology identification result is continuous and stable, it will not need to be corrected; if the formation thickness is more than 0.5 m, it can be divided; if the thick formation is mixed with discontinuous formation which does not exceed 0.5m, the thin formation should be merged into adjacent thick formation; if the lithology cannot be classified, the lithology ioutside the sample appears; if the classification overlap occurs, it should be corrected by the intersection map method. Finally, apart from using lithological correction methods, increasing the number of samples also improves network performance.
Jo ur
na
This research was supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region (2017D01B08), the Scientific Research Planning Project of Xinjiang Uygur Autonomous Region (XJEDU2017S056 and XJEDU2017S057), the Ph.D. Research Startup Foundation of the Xinjiang Institute of Engineering (2016xgy341812), Tianchi Doctor Research Project of Xinjiang Uygur Autonomous Region(BS2017001) and Xinjiang Uygur Autonomous Region key specialty of Geological Engineering. References Abdulaziz, M., A., Hameeda, A., M., Mohamed, H., S., 2018. Prediction of reservoir quality using well logs and seismic attributes analysis with artificial neural network: A case study from Farrud reservoir, Al-Ghani field, Libya. Journal of Applied Geophysics Available online 19. Alipanahi, B., Delong, A., 2015. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nature Biotechnology 33 (8), 831-838. Camila, M., S., Leonardo, G., F., Egberto, Pereira., Leonardo, Costa., O., 2018. Machine learning approaches for petrographic classification of carbonate-siliciclastic rocks using well logs and textural information. Journal of Applied Geophysics 155, 217-225. Deng, L., Yu, D., 2014. Deep learning: methods and applications. Foundations & Trends® in Signal Processing 7 (3), 197-387. Feng, S., Wang, L., Xing, Y., 2017. Andesite reservoir prediction in Wulanhua Sag,Erlian Basin. Oil geophysical prospecting 52(s1), 117-122. Guo, Y., Liu, Y., 2016. Deep learning for visual understanding: A review. Neurocomputing 187 (C),
Journal Pre-proof
Jo ur
na
lP
re
-p
ro
of
27-48. Guoyin, Z., Zhizhang, W., Huaji, L., Yanan, S., Wei, C., 2018. Permeability prediction of isolated channel sands using machine learning. Journal of Applied Geophysics 159, 605-615 Han, Y., Yuan, C., Fan, Y., 2018. Identification of igneous reservoir lithology based on empirical mode decomposition and energy entropy classification: A case study of Carboniferous igneous reservoir in Chunfeng oilfield. Oil & gas geology 39(04), 759-765. Hinton, G. E., 2012. A Practical Guide to Training Restricted Boltzmann Machines. Springer Berlin Heidelberg 7700, 599-619. Khatchikian, A., 1982. Log evaluation of oil-bearing igneous rocks. World Oil 197 (7), 56-72. Li, J., Zhang, J., Han, S., 2017. A review: exploration status, basic characteristics and prediction approaches of igneous rock reservoir. Oil geophysical prospecting 50(2), 382-392. Krizhevsky, A., Sutskever, I., 2012. ImageNet classification with deep convolutional neural networks. International Conference on Neural Information Processing Systems 25 (2), 1097-1105. Lecun, Y., Bengio, Y., 2015. Deep learning. Nature 521(7553), 436-444. Li, J., 2016. Lithology Identification of Carboniferous Low-Radioactivity Igneous Rocks by Logging Data in Zhongguai Swell, Junggar Basin. Xinjiang Petroleum Geology 37(04), 474-478. Liu, Z., Luo, P., 2015. Deep learning face attributes in the wild. Proceedings of the IEEE International Conference on Computer Vision 2015, 3730-3738. Mou, D., Wang, Z., Huang, Y., Xu, S., Zhou, D., 2015. Lithological identification of volcanic rocks from SVM well logging data: case study in the eastern depression of Liaohe Basin. Chinese Journal of Geophysics 58(05), 1785-1793. Noda, K., Yamaguchi, Y., 2015. Audio-visual speech recognition using deep learning. Applied Intelligence 42 (4), 722-737. Rumelhart, D. E., Mcclelland, J. L., 1986. CP GroupParallel distributed processing: explorations in the microstructure of cognition, vol. 2: psychological and biological models. Language 63 (4), 45-76. Sanyal, S. K., Juprasert, S., Jusbasche, M., 1980. An evaluation of rhyolite-basalt-volcanic ash sequencefrom well logs. Log Analyst 1980(1), 28-44. Silver, D., Huang, A., 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529 (7587), 484-492. Tomczak, J. M., Gonczarek, A., 2017. Learning Invariant Features Using Subspace Restricted Boltzmann Machine 45(1), 173–182. Wang, Z., Zhu, X., Sun, Z., 2015. Igneous lithology identification and lithofacies classification in the basin using logging data: Taking Junggar Basin as an example. Earth science frontiers 22(03), 254-268. Xu, C., Liu, W., Tian, W., 2011. The application of acoustolectric imaging logging technology in volcanic rock stratum. Reservoir evaluation and development 1(4), 28-33. Ye, T., Wei, A., Deng, H., 2017. Study on volcanic lithology identification methods based on the data of conventional well logging data: a case from Mesozoic volcanic rocks in Bohai bay area. Progress in geophysics 32(04), 1842-1848. Yu, P., Qiu, P., Shen, J., 2017. Application of Fisher discriminant method in lithologic identification of Maoshan area. China energy and environmental protection 39(11),
Journal Pre-proof
Jo ur
na
lP
re
-p
ro
of
220-224+227. Zhang, L., Zhang, G., Qi. Y., 2017. Lithology identification of carboniferous volcanic rock with logging data in Xiquan Area,Junggar Basin. Xinjiang petroleum geology 38(04), 427-431. Zhang, D., Zou, N., Jiang, Y., 2015. Logging identification method of volcanic rock lithology: A case study from volcanic rock in Junggar Basin. Lithologic reservoirs 27(01), 108-114. Zhang, Z., Gao, C., 2012. Identification of igneous rock lithology by analyzing components of stratum. Well logging technology 36(01), 29-32. Zhou, J., Sun, S., Wang, B., 1998. Application of an genetic neural network to lithologic reoagnition. Journal of Chengdu university of technology 25(S1), 85-88. Zhou, J., Troyanskaya, O. G., 2015. Predicting effects of noncoding variants with deep learning-based sequence model. Nature Methods 12 (10), 931-934.
Journal Pre-proof Conflict of interest statement We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Jo ur
na
lP
re
-p
ro
of
HIGHLIGHTS Training the deep belief network (DBN) to identify lithology by logging data is proposed The number of RBMs, the number of neurons in the hidden layer of each RBM and the classification boundary are determined Lithology identification results of DBN are analyzed and corrected further An accuracy of 94.8% is achieved for lithology identification
Figure 1
Figure 2
Figure 3
Figure 4