International Journal of Coal Geology 83 (2010) 491–493
Contents lists available at ScienceDirect
International Journal of Coal Geology j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / i j c o a l g e o
Some considerations concerning the use of correlation coefficients and cluster analysis in interpreting coal geochemistry data Greta Eskanazy a, Robert B. Finkelman b,⁎, Suman Chattarjee b a b
Sofia University “St. Kliment Ohridski” 1000 Sofia, Bulgaria University of Texas at Dallas, Richardson, TX 75093, USA
a r t i c l e
i n f o
Article history: Received 11 February 2010 Received in revised form 15 May 2010 Accepted 16 May 2010 Available online 24 May 2010 Keywords: Correlation coefficients Modes of occurrence Coal geochemistry
a b s t r a c t The mode of occurrence of trace elements in coal is important from both scientific and environmental viewpoints because the behavior of trace elements in coal utilization depends not only on their concentration but also on their chemical form or mode of occurrence. Statistical methods are one of the more commonly used indirect approaches for interpreting element modes of occurrence. These methods include correlation coefficients between elements and ash as well among the elements themselves, cluster and factor analysis. Using data sets derived from a suite of 75 samples from a Bulgarian lignite deposit we demonstrate that statistical analysis of subsets with different ash ranges and with slightly different suites of elements can lead to substantially different conclusions concerning the element modes of occurrence. In this example we assume that direct information on the element modes of occurrence, such as mineralogy or selective leaching behavior, is absent. Statistical methods, however, can provide useful insights into trace element modes of occurrence for properly constrained sample suites. © 2010 Elsevier B.V. All rights reserved.
1. The use of statistical methods for deducing modes of occurrence The mode of occurrence of trace elements in coal is important from both scientific and environmental viewpoints because the behavior of trace elements in coal utilization depends not only on their concentration but also on their chemical form or mode of occurrence. This useful parameter may be deduced by both direct (microbeam techniques, X-ray diffraction, etc.) and indirect methods (Finkelman, 1995; Huggins, 2002). Statistical methods are one of the more commonly used indirect approaches for interpreting element modes of occurrence. These methods include correlation coefficients between elements and ash as well among the elements themselves, cluster and factor analysis, and other methods (e.g. normative analysis) to decipher the links among trace elements, their mode of occurrence, and their genesis (see for example: Harris et al., 1981; Tian et al., 1987; Van der Flier-Keller and Fyfe, 1987; Spears and Zheng, 1999; Alastuey et al., 2001; Kortenski and Sotirov, 2002; Shao et al., 2003; Ewa, 2004; Dai et al., 2005; Kalkreuth et al., 2006; Shaver et al., 2006; Chatziapostolou et al., 2006; Hu et al., 2006; Dai et al., 2008; Spears and Tewalt, 2009). Finkelman (1980) stated: “There appears to be a tendency to attribute similar modes of occurrence to elements that exhibit strong positive statistical correlations….This
⁎ Corresponding author. Tel.: +1 972 473 7414. E-mail address:
[email protected] (R.B. Finkelman). 0166-5162/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.coal.2010.05.006
procedure is dangerous, particularly when bulk samples or multiple samples are used.” When applying statistical methods to coal one critical condition that should be met is that the population (data set) studied should be homogeneous (Davis, 1973; Vistelius, 1980). For coals, this ideally would mean that the samples should be from the same stratigraphic horizon and that the range of ash yield be constrained to avoid mixing the organic-rich samples with detrital-rich samples. In practice it is not possible to strictly maintain this condition, considering coal geochemical data as the proportion of the coal substance/mineral components that changes with increasing ash yield. A population from a distinct deposit comprising coal samples with ash yield from 5% to 30% or more may not be homogeneous (in sensum strictum). As a result, spurious correlations between the elements may be observed. For this reason, in many cases, correlations among some elements are difficult to explain. 2. Examples of potential problems using sample suites with a broad range of ash yields Rather than selecting examples from the literature that have resulted in awkward geochemical interpretations, we will illustrate the potential problems using one of our own (GE) data sets, a set of 75 samples from the Elhovo lignite deposit in Bulgaria. In these examples we assume that direct information on the element modes of occurrence, such as mineralogy or selective leaching behavior, is absent. These data would be useful in constraining interpretations from the various statistical methods.
492
G. Eskanazy et al. / International Journal of Coal Geology 83 (2010) 491–493
The set comprises coals, coaly shales and partings with a range in ash yield of 4.1 to 78.9 wt.%. Nineteen trace elements were determined on each sample. R-mode cluster diagrams for the full sample suite are presented in Fig. 1a,b,c. Panel a was constructed from four samples with ash yields less than 10 wt.% (range from 4.1 to 8.4%, mean ash yield 7.1%); b was constructed from 46 samples with ash yields up to 40 wt.% (range from 4.1 to 38.1%, mean ash yield 23.3%) and; c was constructed from 29 coaly shale samples with mean ash yield 52.1% (range from 40.2 to 78.9). Despite the small number of samples (all of them are xylain) with ash yield below 10%, there are strong correlations between many elements (Fig. 1a). From these data conclusions can be drawn about the elements' associations and modes of occurrence. For instance, the very strong correlation Ge–V is determined by their bond to the organic matter that was proven experimentally (Eskenazy and Mincheva, 1994). There is no correlation between K and Na, as in this population Na is mainly organically associated. Other strong correlations include Fe and Zr, Ni and Sr, Ga and Co, Zn and Cu. Ash (Ad) correlates strongly with Mn, K and Ti. From these data conclusions can be drawn about the elements' associations and modes of occurrence. However, when we expand the sample suite to include all samples with ash yields up to 40% (Fig. 1b) an entirely different set of element relationships is apparent. Ash does not correlate strongly with Mn or Ti but rather with Ga, Pb and Zr correlate strongly, as does Zn and Cu, Ni and Co, but not V and Cr. By adding samples from the same location but with a different ash range the element relationships are strongly altered yielding different conclusions concerning element relationships and modes of occurrence. We believe that the altered element relationships reflect the mixing of sample subsets in which the elements have different modes of occurrence: in the low-ash suite many elements probably have organic associations, whereas in the higher ash suite many of the same elements are likely to have inorganic associations. The data for the coaly shale samples (Fig. 1c) show a yet different set of element relationships. Contrary to the low-ash samples illustrated in Fig. 1b, K and Na correlate strongly as both elements are likely present in the aluminosilicate minerals. Ni and Co correlate strongly but ash correlates with Ti and not Ga, Pb and Zr differ considerably. V and Cr correlate strongly but they did not correlate in the previous data set. Many element relationships are quite different than in the previous two data sets and the interpretation would yield vastly different conclusions about element associations and modes of occurrence. Combining all the samples would lead to yet a different set of element relationships and conclusions with regard to the element associations and modes of occurrence. Fig. 2 presents R-mode cluster analysis for the same dataset (19 samples), but in which 36 elements were determined. The mean ash yield is 21.6% (ranging from 4.1% to 38.1% ash). Adding 17 additional elements alters the element relationships once more. Ash correlates with Ga and Cs, not Ti; K does not correlate with Na; V correlates with Ti, not with Ge as in Fig. 2. This was explained by the abundant titanomagnatite mineralization in the source rocks (Eskenazy and Mincheva, 1994). The REE do form a compact grouping on the left of the diagram as does many chalcophile elements on the right. Nevertheless, it is obvious that once again there is yet a different set of element relationships that would result in a different set of interpretations with regard to the element associations and modes of occurrence. 3. Concluding thoughts and recommendations From these examples it could be seen that the element correlations are dependent on the homogeneity of the population, the number of samples, and the elements determined. Clearly, a sound knowledge of coal chemistry and inorganic geochemistry would help identify
Fig. 1. R-mode cluster for the Elhovo lignite basin, Bulgaria in which 19 elements were determined (95% confidence level): a — 4 samples with ash below 10% (mean 7.1%); b — 46 samples up to 40% ash (mean ash 23.3%); and c — coaly shales (mean ash 52.1%). Note: The process of generating the dendrogram was carried out in MATLAB. The dataset is first standardized using Z-score standardization by ubtracting the mean of each of the elements from the value and then dividing the result by the standard deviation of the element. This generates values that have a mean of zero and a standard deviation of 1. The Euclidean distances between pairs of objects is then determined using the formula y = pdist(x) where x represents the data matrix obtained after standardization. Rows correspond to the elements and columns correspond to the samples.Y now represents the dissimilarity matrix. The hierarchical cluster tree is now obtained from the distance matrix by z = linkage(y) using the centroid method. Then H = dendrogram(Z) generates a dendrogram plot of the hierarchical, binary cluster tree. In a dendrogram the objects are connected with U shaped lines with the height of each U representing the distance between the pairs of objects being connected.
G. Eskanazy et al. / International Journal of Coal Geology 83 (2010) 491–493
493
References
Fig. 2. R-mode cluster for Elhovo lignites in which 36 elements were determined.
element relationships that are unlikely to occur in nature. However, a well constrained sample suite would minimize the probability that the statistical manipulation of the data would imply an improbable element relationships. Correlation coefficients and R-mode cluster can be helpful in deducing the mode of occurrence of trace elements in coals. Geochemically reasonable conclusions may be obtained when, as far as possible, homogeneous populations are investigated. We recommended the following conditions be met: • The population should include coals of a narrow ash yield interval. • Coal populations of different host provinces not to be mixed as the geological factors are the most important factor influencing the trace element geochemistry. Coals from different deposits may be combined only in order to distinguish them, not to determine the mode of occurrences of the trace elements. • The number of samples should be large enough, so that the results obtained are statistically significant. • The correlation coefficients should be interpreted along with other methods investigating the mode of occurrence of the elements. More reliable results may be obtained if a single mode of occurrence dominates. Such is the case with the coals from the Parana basin (Kalkreuth et al., 2006) for which inorganic association of the trace elements dominates.
Alastuey, A., Jimenez, A., Plana, F., Querol, X., Suarez-Ruiz, I., 2001. Geochemistry, mineralogy, and technological properties of the main Stephanian (Carboniferous) coal seams from the Puertolano Basin Spain. Int. J. Coal Geol. 45, 247–265. Chatziapostolou, A., Kalaitzidis, S., Papazisimou, S., Christanis, K., Vagias, D., 2006. Mode of occurrence of trace elements in Pellana lignite (SE Peloponnese, Greece). Int. J. Coal Geol. 65, 3–16. Dai, S., Ren, D., Tang, Y., Yue, M., Hao, L., 2005. Concentration and distribution of elements in Late Permian coals from western Guizhou Province, China. Int. J. Coal Geol. 61, 119–137. Dai, S., Li, D., Chou, C.-L., Zhao, L., Zhang, Y., Ren, D., Ma, Y., Sun, Y., 2008. Mineralogy and geochemistry of boehmite-rich coals: new insights from the Haerwusu Surface Mine, Jungar Coalfield, Inner Mongolia, China. Int. J. Coal Geol. 74, 185–202. Davis, J.C., 1973. Statistics and Data Analysis in Geology. John Wiley & Sons, Inc., New York. Eskenazy, G., Mincheva, E., 1994. Geochemical characterization of the Elhovo coal basin. Ann. De Univ. de Sofia “St. Kl. Ohridski”, Livre 1 — Geol. 1 84, 65–84. Ewa, J.O.B., 2004. Data evaluation of trace elements determined in Nigerian coal using cluster procedures. Appl. Radiat. Isot. 60, 751–758. Finkelman, R.B., 1980. Modes of occurrence of trace elements in coal. Dissertation. Univ. of Maryland, College Park, Md. 301 pp. Finkelman, R.B., 1995. Modes of occurrence of environmentally-sensitive trace elements in coal. In: Swaine, D.J., Goodarzi, F. (Eds.), Environmental Aspects of Trace Elements in Coal. Kluwer Pubs, Dordrecht, pp. 24–50. Chapter 3. Harris, L.A., Barrett, H.E., Kopp, O.C., 1981. Elemental concentration and their distribution in two bituminous coals of different paleoenvironments. Int. J. Coal Geol. 1, 175–193. Hu, J., Zheng, B., Finkelman, R.B., Wang, B., Wang, M., Li, S., Wu, D., 2006. Concentration and distribution of sixty-one elements in coals from DPR Korea. Fuel 85, 679–688. Huggins, F.E., 2002. Overview of analytical methods for inorganic cconstituents in coal. Int. J. Coal Geol. 50, 169–214. Kalkreuth, W., Holz, M., Kern, M., Machado, G., Mexias, A., Silva, M.B., Willet, J., Finkelman, R., Burger, H., 2006. Petrology and chemistry of Permian coals from the Parana Basin: 1. Santa Terezinha, Leao-Butia and Candiota Coalfields, Rio Grande do Sul, Brazil. Int. J. Coal Geol. 68, 79–116. Kortenski, J., Sotirov, A., 2002. Trace and major element content and distribution in Neogene lignite from the Sofia basin, Bulgaria. Int. J. Coal Geol. 52, 63–82. Shao, L., Jones, T., Gayer, R., Dai, S., Li, S., Jiang, Y., Zhang, P., 2003. Petrology and geochemistry of the high-sulphur coals from the Upper Permian carbonate coal measures in the Heshan Coalfield, southern China. Int. J. Coal Geol. 55, 1–26. Shaver, S.A., Hower, J.C., Eble, C.F., McLamb, E.D., Kuers, K., 2006. Trace element geochemistry and surface water chemistry of the Bon Air coal, Franklin County, Cumberland Plateau, Southeast Tennessee. Int. J. Coal Geol. 67, 47–78. Spears, D.A., Zheng, Y., 1999. Geochemistry and origin of elements in some UK coals. Int. J. Coal Geol. 38, 161–179. Spears, D.A., Tewalt, S.J., 2009. The geochemistry of environmentally important trace elements in UK coals with special reference to the Parkgate coal in Yorkshire– Nottinghamshire Coalfield, UK. Int. J. Coal Geol. 80, 157–166. Tian, J., Chou, C.-L., Ehmann, W.D., 1987. INAA determination of major and trace elements in loess, paleosol and precipitation layers in a Pleistocene loess section, China. J. of Radioanalytical and Nuclear ChemistryArticles 110 (1), 261–274. Van der Flier-Keller, E., Fyfe, W.S., 1987. Geochemistry of two Cretaceous coal-bearing sequences: James Bay lowlands, northern Ontario, and Peace River basin, northeast British Columbia. Can. J. Earth Sci. 24, 1038–1052. Vistelius, A.B., 1980. Principles of Mathematical Geology. Nauka, Leningrad. 384 pp.