Marine and Petroleum Geology xxx (xxxx) xxx–xxx
Contents lists available at ScienceDirect
Marine and Petroleum Geology journal homepage: www.elsevier.com/locate/marpetgeo
Discussion
Comment on “Correlation analysis of element contents and mechanical characteristics of shale reservoirs” by Liu et al. (2018) Ye Zhanga, Y.Z. Mab,∗, Ernest Gomezb a b
Department of Geology and Geophysics, University of Wyoming, Laramie, WY 82071, USA Schlumberger, Denver, CO 80202, USA
ABSTRACT
We found the article and conclusion by Liu et al. (2018) interesting, as the authors conducted a correlation analysis between element contents and rock mechanical characteristics of an organic rich shale in southern China. When analyzing data obtained from logging shale gas wells, non-parametric methods for calculating correlation coefficients were employed along with statistical significance tests that utilize confidence levels. The authors' conclusion suggests that mechanical properties cannot be solely determined based on the elemental composition of shale, while mineralogy (quartz vs. clay fractions) can explain some of the effects observed in their correlation analysis. In our opinion, a couple of points regarding the correlation analysis and the brittleness index, as introduced by Liu et al. (2018), were not clearly presented. Further elucidations are needed to clarify these points for the readers of this journal. Herein, we present a discussion on the two following issues: The first issue on correlation analysis impacts the second issue of constructing a mathematical model of brittleness index, and thus is slightly more extensively discussed.
• Liu et al. (2018)'s analysis on the strength and significance of the correlation coefficients; • Liu et al. (2018)'s mathematical model of the brittleness index. 1. Correlation analysis Liu et al. (2018, page 23–26) used the Spearman correlation coefficient to analyze the relationships between element contents and rock mechanical properties because of the nonnormality of their data. For two selected confidence levels of 95% and 99%, the authors analyzed the significance of the Spearman correlation coefficients by hypothesis testing. We believe that Liu et al. (2018) used the right approaches in their correlation analysis. However, we've found that a few key results of their correlation analysis were not clearly presented, which may cause confusions to the readers. First, it appears that the article has mislabeled the significance of correlation using confidence levels. On page 24 (Liu et al., 2018), stated: ‘In this paper, the correlation is highly significant when the confidence level is 0.95, and the correlation has low significance when the confidence level is 0.99.’ This statement appears to be inconsistent with another later statement in their Table 5, where it said ‘“*” indicates that the correlation is of low significance when the confidence
∗
interval is 0.95; “**” indicates that the correlation has high significance when the confidence level is 0.99’ (page 26, in the footnote section of Table 5). The “correct” statement, we believe, is the latter one. The authors should clarify the significance of the correlation in their study. Researchers generally use a null hypothesis to test whether a correlation is significantly different from the null hypothesis (see e.g., Zimmerman et al., 2003), but no null hypothesis was mentioned in Liu et al.'s paper. This may have led to the inconsistent distinction between strength and significance of correlation in Liu et al.'s analysis, which is our second point of discussion. Secondly, Liu et al. (2018, pages 24–26) appeared to have a mixed interpretation of the strength versus the significance of the Spearman correlation coefficient. In Table 5, which summarizes their key results, the magnitudes of the correlation coefficients were tabulated along with the significance indicators based on the 95% and 99% confidence levels. For 15 element contents and mechanical properties, 225 correlation coefficient values exhibiting a wide range of variation were reported. We believe that these results are credible because it is common to observe a wide range of correlation when the relationships of compositional elements are examined (Chayes, 1960; Aitchison, 1986). However, although Table 5 has separated these two measures of correlation, Liu et al. (2018) did not accurately interpreted them. For
Corresponding author. E-mail address:
[email protected] (Y.Z. Ma).
https://doi.org/10.1016/j.marpetgeo.2019.06.004 Received 25 May 2019; Received in revised form 3 June 2019; Accepted 4 June 2019 0264-8172/ © 2019 Elsevier Ltd. All rights reserved.
Please cite this article as: Ye Zhang, Y.Z. Ma and Ernest Gomez, Marine and Petroleum Geology, https://doi.org/10.1016/j.marpetgeo.2019.06.004
Marine and Petroleum Geology xxx (xxxx) xxx–xxx
Y. Zhang, et al.
Table 5 Magnitude and significance of the correlation coefficient between element contents and mechanical parameters (simplified from Table 5 of Liu et al. (2018).
Ti S …… BI
Ti
……
μ
……
BI
1 −0.385∗∗ …… −0.732∗∗
…… …… …… ……
0.569∗∗ −0.216∗∗ ……. −0.801∗∗
…… …… …… ……
−0.732∗∗ 0.301∗∗ …… 1
Table 6 Magnitude and significance of the correlation coefficient between rock mechanical parameters and Si content (from Table 6 of Liu et al. (2018)). Si content
N
Rs (Si, μ)
Rs (Si, E)
Rs (Si, SS)
Rs (Si, BI)
1.84–28.08 28.10–31.32 31.37–44.24
110 110 270
0.227 −0.370 −0.524
−0.482 0.231 0.728
−0.538 0.074 0.642
−0.470 0.301 0.703
squared value of 0.4714. Is this R-squared value obtained using the Pearson or the Spearman correlation coefficient? If it is the Spearman correlation, the author should explain it because it is uncommon to use this correlation for calculating the R-squared value. If it is the Pearson correlation, would it then imply a limitation of the Spearman correlation for extended use?
In Table 5, “*” indicates that the correlation is of low significance when the confidence interval is 0.95; “**” indicates that the correlation has high significance when the confidence level is 0.99; the others have no significant correlation. Note: This table is simplified from Liu et al. (2018) for discussion purpose. Please refer Liu et al. (2018) for the complete table.
example, the article states ‘as shown in Table 5, Poisson ratio has no significant correlation with the Ca content and high correlation (99% confidence interval) with other eight element. The correlations are in the following order: Si > Ti > Fe > K > Al > Mg > S > Mn.’ This sentence implies that the authors interpreted a correlation coefficient equal to or greater than −0.154** (i.e., the correlation coefficient for Mn with the Poisson ratio) as high while a value of 0.032 (i.e., the correlation coefficient for Ca with the Poisson ratio) as low. On page 24, the authors made a similar statement when interpreting the correlation coefficients between the elemental contents and the Young's modulus, bulk modulus, and others, respectively. Moreover, Liu et al. (2018) did not present any detail of their hypothesis tests. Given that the most commonly used null hypothesis for correlation coefficient is whether it is significantly different from zero (see e.g., Loftus and Loftus, 1988), a correlation coefficient of −0.154** with a confidence level of 99% merely implies that this value is significantly different from 0 (the null hypothesis), but it does not imply a high strength of correlation. In other words, significance of a correlation coefficient is relative to the type of the hypothesis test performed. Herein, we assume that the authors had followed the conventional null hypothesis for correlation analysis. If it is the case, the authors have likely inaccurately interpreted the significance of correlation as the strength of correlation. If the authors did not use the most common null hypothesis, they should present their null hypothesis. To clarify the importance of distinguishing the strength and significance of a correlation coefficient, it is important to point out that the purpose of hypothesis test is whether a set of sample statistics is similar to their population counterparts. One can imagine that if the population was fully known, the significance of the (population) correlation coefficient would then no longer be meaningful. On the other hand, in a statistical analysis conducted on real measurements, based on the number of samples used and other factors, a confidence interval can be constructed to indicate whether a particular correlation coefficient is significant or not (Fieller et al., 1957; Bonett and Wright, 2000). While such a confidence interval does not indicate whether the strength of correlation is low or high, the strength of correlation is still a valid sample statistic. Therefore, a correlation coefficient can be high while the hypothesis test indicates a low significance (e.g., if the criterion used is strict, such as a large number of sample data being required) or a correlation coefficient can be low while having a meaningful significance as derived from a less strict criterion in the hypothesis test along with a certain confidence interval (see e.g., Loftus and Loftus, 1988; Bonett and Wright, 2000). For example, a large number of samples can enhance the confidence level, but generally not increasing the magnitude/strength of the correlation coefficient. Thirdly, throughout the article (Liu et al., 2018), the Spearman correlation analysis appeared to be invoked in an inconsistent manner. For example, in Fig. 6b (Liu et al., 2018, page 27), a fitted linear regression on a cross-plot of Clay versus Quartz contents yields a R-
2. Mathematical model of the brittleness index After performing the correlation analysis of element content and rock mechanical properties, Liu et al. (2018) presented their subsection 5.2 (page 24) with the heading ‘Unreliable determination of mechanical parameters using only the element contents’. This appears to be a reasonable conclusion based on their earlier analysis. In the same subsection, Table 6 was then presented with which the authors attempted to analyze the conditional relationships among the data using a criterion of the Si content of the shale. In support of one of their key conclusions [Conclusion (1) in Liu et al., 2018, page 26], the authors stated, on page 24, ‘Through the change in the magnitude and significance of the correlation, the range of “Si” for different classes is determined’. In our opinion, this statement should be supported by a detailed analysis, because the Si cutoff values (28.08 and 31.32% in Table 6) used to separate their 3 classes can strongly impact the magnitude of the computed within-class (or conditional) correlation coefficients. This is because of the well-documented statistical phenomenon called Yule-Simpson's aggregation (or disaggregation) effect or the Simpson's paradox (Paris, 2012; Li et al., 2013; Ma and Gomez, 2015; Ma, 2019). The effect of aggregation or disaggregation on calculating the conditional statistics could be demonstrated when the marginal correlations among data are examined in a cross-plot. To determine if the Yule-Simpson's effect is at play, the authors should provide a set of cross-plots between Si and u, Si and E, Si and SS, Si and BI. More importantly, such cross-plots should provide better understanding of potential uses of elemental contents to derive mathematical models of rock mechanical properties based on the correlation analysis, which should be the primary goal of the study. Finally, despite reaching a conclusion that a reliable determination of mechanical parameters cannot be obtained using only the shale's element contents, Liu et al. (2018) presented two brittleness index equations that depend solely on the element contents (see Equations (12) and (13) on pages 25–26). The authors did not provide a rational for their approaches nor providing instructions on how to use the results of their correlation analysis to derive these equations. Some key parameters, such as ‘ω’, ‘min’ and ‘max’, as shown in those equations, are not defined. Are they scalar coefficients or mathematical functions? It is, in our opinion, rather uncommon in the scientific literature to see ‘min’ and ‘max’ used as shown below in their Equation (12):
BRITe =
(Si + Ca + Mg ) (Si + Ca + Mg )max
(Si + Ca + Mg )min × 100 (Si + Ca + Mg )min
(12)
In order to convey the results and insights effectively to the readers, the authors should clarify their key notations. A similar issue has also occurred elsewhere, e.g., Equations (9) and (10) for the calculations of the Spearman correlation coefficient appear to be mis-referenced as ‘Equations (6) and (7)’, which are the equations for the shear modulus and the bulk modulus, respectively. 2
Marine and Petroleum Geology xxx (xxxx) xxx–xxx
Y. Zhang, et al.
3. Conclusion
Acknowledgement
The study by Liu et al. (2018) provides new insights into correlations among element contents and rock mechanical properties of a shale gas reservoir in southern China, and as such, result of their study is of interest to a general geoscience audience. While we agree with the author's choice of the non-parametric tests and the use of the Spearman correlation, Liu et al. (2018) did not present a detailed analysis of their hypothesis tests and did not consistently distinguish the strength of a correlation coefficient from its statistical significance. These two notions (strength vs. significance) represent very different concepts in analyzing multivariate relationships using a limited set of measurements. The nonclear distinction of these two concepts may have contributed to the inability of using their correlation analysis in formulating mathematical models of geomechanical properties in the article. Indeed, correlation analysis generally should not be the end goal of a study, but rather, it should be part of data analytics. In this regard, Liu et al. (2018) did not discuss how the result of their correlation analysis should be used for generating mathematical models of geomechanical properties of the concerned shale reservoir. Although some conditional relationships based on cutoff values of Si content were presented, the authors did not use their conditional correlation analysis in formulating their mathematical models.
The authors wish to thank Dr. Tuanfeng Zhang for his helpful comments on this discussion paper. References Aitchison, J., 1986. The Statistical Analysis of Compositional Data. Chapman & Hall, London. Bonett, D.G., Wright, T.A., 2000. Sample size requirements for estimating Pearson, Kendall and Spearman correlations. Psychometrika 65, 23–28. Chayes, F., 1960. On correlation between variables of constant sum. J. Geophys. Res. 65 (12), 4185–4193. Fieller, E.C., Hartley, H.O., Pearson, E.S., 1957. Tests for rank correlation coefficients. Biometrika 44, 470–481. Li, Y., et al., 2013. Experimental investigation of quantum Simpson's paradox. Phys. Rev. A 88 (1), 015804. Liu, J., et al., 2018. Correlation analysis of element contents and mechanical characteristics of shale reservoirs: a case study in the Cen’gong block, South China. Mar. Pet. Geol. 91, 19–28. Loftus, G.R., Loftus, E.F., 1988. Essence of Statistics, second ed. Alfred A. Knopf Inc., pp. 639p. Ma, Y.Z., 2019. Quantitative Geosciences. Springer, Gewerbestrase, Switzerland, pp. 640. Ma, Y.Z., Gomez, E., 2015. Uses and abuses in applying neural networks for predicting reservoir properties. J. Pet. Sci. Eng. 133, 66–75. https://doi.org/10.1016/j.petrol. 2015.05.006. Paris, M.G., 2012. Two quantum Simpson's paradoxes. J. Phys. A 45, 132001. Zimmerman, D.W., Zumbo, B.D., Williams, R.H., 2003. Bias in estimation and hypothesis testing of correlation. Psicogica 24, 133–158.
3