International Journal of Coal Geology 48 (2001) 1 – 22 www.elsevier.com/locate/ijcoalgeo
A geostatistical approach to predicting sulfur content in the Pittsburgh coal bed William D. Watson a,*, Leslie F. Ruppert b, Linda J. Bragg b, Susan J. Tewalt b a
US Department of Energy, Energy Information Administration, 950 L’Enfant Plaza South Building EI-52, 1000 Independence Avenue SW, Washington, DC 20585, USA b US Geological Survey, National Center, MS 956, Reston, VA 20192, USA Received 16 October 2000; accepted 25 June 2001
Abstract The US Geological Survey (USGS) is completing a national assessment of coal resources in the five top coal-producing regions in the US. Point-located data provide measurements on coal thickness and sulfur content. The sample data and their geologic interpretation represent the most regionally complete and up-to-date assessment of what is known about top-producing US coal beds. The sample data are analyzed using a combination of geologic and Geographic Information System (GIS) models to estimate tonnages and qualities of the coal beds. Traditionally, GIS practitioners use contouring to represent geographical patterns of ‘‘similar’’ data values. The tonnage and grade of coal resources are then assessed by using the contour lines as references for interpolation. An assessment taken to this point is only indicative of resource quantity and quality. Data users may benefit from a statistical approach that would allow them to better understand the uncertainty and limitations of the sample data. To develop a quantitative approach, geostatistics were applied to the data on coal sulfur content from samples taken in the Pittsburgh coal bed (located in the eastern US, in the southwestern part of the state of Pennsylvania, and in adjoining areas in the states of Ohio and West Virginia). Geostatistical methods that account for regional and local trends were applied to blocks 2.7 mi (4.3 km) on a side. The data and geostatistics support conclusions concerning the average sulfur content and its degree of reliability at regional- and economic-block scale over the large, contiguous part of the Pittsburgh outcrop, but not to a mine scale. To validate the method, a comparison was made with the sulfur contents in sample data taken from 53 coal mines located in the study area. The comparison showed a high degree of similarity between the sulfur content in the mine samples and the sulfur content represented by the geostatistically derived contours. Published by Elsevier Science B.V. Keywords: Geostatistical analysis; Pittsburgh coal bed; Comparison with mine data; Kriging; Statistical precision
1. Introduction
*
Corresponding author. Tel.: +1-202-287-1971; fax: +1-202287-1934. E-mail addresses:
[email protected] (W.D. Watson),
[email protected] (L.F. Ruppert),
[email protected] (L.J. Bragg),
[email protected] (S.J. Tewalt). 0166-5162/01/$ - see front matter. Published by Elsevier Science B.V. PII: S 0 1 6 6 - 5 1 6 2 ( 0 1 ) 0 0 0 3 5 - 0
The US Geological Survey (USGS) recently completed a digital coal resource assessment model of the Upper Pennsylvanian Pittsburgh coal bed, which indicates that out of the original 34 Gt (short tons) [31 Gt] in the bed, 16 Gt (short tons) [14 Gt] remain after subtracting mined-out coal and coal lost in
2
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
mining (Ruppert et al., 1999). After technical, environmental, and safety restrictions are applied to the remaining Pittsburgh coal, only 12 Gt (short tons) [11 Gt] are available for mining (Watson et al., 2000). This analysis is an example of regional coal availability studies currently underway to estimate mineable resources for major coal beds in the US. This study introduces methodology and preliminary results of USGS’s follow-on analyses of coal quality using the Pittsburgh coal bed as an example. Using Geographic Information System (GIS) methods and geostatistical modeling, the sulfur content of the coal resources remaining in the Pittsburgh bed was estimated. The analysis was conducted by (1) constructing a database of sample point data on total sulfur content (wt.%) on an as-received basis (Tewalt et al., in press), (2) removing a small number (11 out of 738) of samples found to be extreme outliers from an examination of spatial correlation, (3) applying exploratory data analysis to identify a regional trend in sulfur with predominantly lower sulfur values occurring at eastern locations, (4) constructing an empirical variogram that demonstrates coal sulfur content as spatially correlated for sampling locations up to 24 mi (38.6 km) apart, (5) applying universal kriging to the sample data to account simultaneously for the regional east – west trend and local spatial correlation, and (6) applying block kriging to estimate expected coal sulfur content and its standard error. This last step, applied to areas of about 7 mi2 (18.1 km2) with coal adequate to support several large prospective coal mines, adjusts the analysis to a scale appropriate for regional assessment. Therefore, these estimates are not good indicators of coal sulfur content at a mine scale. Our sample data set is too small and of insufficient density to evaluate sulfur distributions and trends at a mine scale. Only later, as mining companies undertake closely spaced, gridded, in-fill sampling, will it be possible to determine, more precisely, the location of significant blocks of coal within a target sulfur range at mine scale. However, the large block estimates do provide information about the likely availability of coal within sulfur ranges, and these estimates are generalized indicators at a regional level of where additional sampling is needed to delineate coal within a target sulfur range. Finally, we have simulated random fields for coal sulfur content based upon the covariance structure in
our sample data. Each set of ‘‘new draws’’ is used to repeat the estimation of sulfur content by block. This procedure allows us to report prediction error bars for the cumulative distribution of tons of remaining coal in the Pittsburgh bed by sulfur content. While these results provide an informative summary of what we can currently infer about sulfur in the remaining Pittsburgh coal, it must be emphasized that additional data could change the modeled structure and alter the estimates of the cumulative distribution significantly. Previous applications of geostatistical methods to the estimation of the sulfur and ash contents in coal include studies by Gomez and Hazen (1970), Hohn et al. (1988), Murphy and Brown (1993) and Cressie (1993). Interestingly, all but Murphy and Brown (1993) applied geostatistics to sections of the Pittsburgh coal bed. Gomez and Hazen (1970) evaluated coal ash and sulfur for the Robena mine on the Pittsburgh coal bed in the southwestern corner of the state of Pennsylvania, US. Gomez and Hazen (1970) applied a multivariate statistical model that related coal sulfur to ash, sample location, roof type, and bed thickness. Their objective was to predict organic and pyritic sulfur content to guide mining decisions, because mainly only pyritic sulfur can be removed in coal washing plants. In their analysis, they constructed contour maps showing the percentage of pyritic sulfur that is expected to be removed by coal washing. Their study broke new ground in terms of data exploration techniques and the creative application of geostatistics to a challenging problem. Later, Cressie (1993) used the same data from the Robena mine to illustrate the power of kriging (a parametric statistical method described below) as a technique to predict coal ash content. Cressie identified an east-towest regional trend in the coal ash data (lower in the east, higher in the west), removed this trend from the data, and applied kriging to the de-trended data to predict coal ash content at non-sampled points. Murphy and Brown (1993) applied kriging to cores taken in a targeted mining area, in order to determine whether the coal would meet contract sulfur specifications. Hohn et al. (1988) applied kriging to widely spaced observations on coal total sulfur content and computed cumulative frequency sulfur distributions for coal in blocks with areas of 1 mi2 (2.6 km2). The objective of the current study is similar to that of Hohn et al. (1988), namely, to gain a better under-
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
standing of the effects of statistical variability, and the role of sample spacing in choosing a scale (for example, mine scale or regional scale) so that the analysis has support from the data. In addition, this study extends the scope of the earlier studies by applying kriging methods, to estimate the statistical precision of the cumulative frequency distribution for coal sulfur content. As an aid to the discussion, Table 1 contains definitions of geostatistical terms used in the study to describe the methods and results.
2. Data The Pittsburgh coal bed database on sulfur content (wt.%) consists of data from core, mine, and outcrop samples on an as-received basis and was derived from a variety of sources including the US Geological Survey, US Bureau of Mines, the Pennsylvania State University, as well as other federal and state sources. From a total of 3377 analyses, only 1017 samples had locations specified by latitude and longitude coordinates. These geo-referenced samples were used in this study. Out of the 1017 samples, 919 samples were found to be located within the large resource area (labeled A on Fig. 2) of the Pittsburgh coal bed, which is the target study area for this analysis. These 919 sample data points have a high degree of clustering and many points are located in areas where the coal has already been mined. A number of the samples have nearly the same location, and elimination of samples within 50 ft of other samples further reduced the sample set to 738 records. In these cases, a random selection was made from the near co-located samples and a single record at each location was kept in the data set. In most cases, the samples at closely spaced points had similar values. Multiple values at near-duplicate locations can make it impossible to solve for the geostatistical kriging weights (discussed below). Furthermore, in this analysis, the emphasis is on block or regionalscale variation in coal sulfur content, not micro-scale variation in densely sampled areas. The 738 sample data points were grouped or binned according to a specific span of distance separating every unique pair of sample values. The difference in each pair’s values (difference in sulfur
3
content) was squared and used to calculate the average of the squared differences in coal sulfur content for all the binned pairs at the specified span in distance. The equation for this calculation, with a division by 2, is cðhÞ ¼
X 1 ðzi zj Þ2 2jN ðhÞj N ðhÞ
ð1Þ
where N(h) is the set of all pairwise Euclidean distances i j = h, | N(h)| is the number of distinct pairs in N(h), and zi and zj are data values at spatial locations i and j, respectively. Fig. 1 is a boxplot of the square-root-differences for the point sulfur data, obtained by taking the square root of the difference (in absolute value) of the coal sulfur content between paired sample points, rather than squaring the difference, as in Eq. (1). The boxplot of the square root differences has been shown to be a particularly good tool for identifying atypical observations as opposed to observations that occur because distributions are skewed (Kaluzny et al., 1997). The boxplot shows the spread of the square root of the difference in sulfur content values (vertical axis) between pairs of sulfur samples for different separation distances (horizontal axis). The open circles at low separation distances (Fig. 1, upper left side) represent sulfur content differences that lie well outside the spread of the rest of the data. Because our objective was to examine the data for spatial correlation at a regional scale, the sulfur samples responsible for these extreme differences were removed from the sample set. In this case, 11 sulfur samples, which are scattered throughout the sampled area, account for the extreme outlying differences. Six of these samples range in value from 5.0% to 6.75% sulfur. The remaining five samples range in value from 0.7% to 1.5% sulfur. In the areas where they are located, these samples have values that fall far outside the range of the sulfur values for surrounding samples. After their removal, 727 point samples on sulfur remain (Fig. 2). While many of the samples occur at locations where coal has been mined out, they are valid observations for estimating the sulfur content in the large contiguous area with remaining coal. The geostatistical procedure used in the analysis assumes that the processes that formed sulfur content in the coal were operating over a continuous spatial extent and, there-
4
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
Table 1 Definitions of geostatistical terms Term Kriging
Definition
A statistical prediction method that analyzes correlation of sample values in terms of their spatial proximity to each other and uses a mathematical representation of their spatial correlation to make predictions at locations where the values are not known. In theory, the predictions are unbiased and have the smallest prediction error among predictors calculated as the weighted sum of known values. Block kriging The use of kriged point estimates of a characteristic to predict the average of a characteristic within a specified area or block. In theory, the predicted block averages are unbiased and have the smallest prediction error compared to other predictors that use weighted sums of known values. Universal kriging Approximately, known values are regressed on location to form a fitted surface, known values are subtracted from their computed value on the fitted surface, and kriging is applied to the resulting difference. Predictions are the composite of the predicted differences from the fitted surface (calculated using the estimated kriging weights) plus the trend prediction from the fitted surface. Cumulative frequency The graph of a characteristic for a variable from lowest to highest plotted on distribution the horizontal axis versus the cumulative associated sum of the variable plotted on the vertical axis. See Figs. 12 and 13. Euclidean distance The distance between two points as measured by the length of the hypotenuse of the right triangle when each of the points is located at opposite ends of the hypotenuse. Equal to the square root of (x2 x1)2+( y2 y1)2, where (x2, y2) and (x1, y1) are the locations of points 2 and 1, respectively. Boxplot A graph of the distribution of values for a variable. The ‘‘box’’ part of the boxplot covers the range of values that make up the middle 50% of the values. Starting from the box, ‘‘whiskers’’ go out to the extremes of the data. Whiskers are placed at the most extreme values or at the box end, plus or minus (usually) 1.5 times the 50% middle range. Very extreme points outside the whiskers are shown by themselves. See Fig. 1. Kriging weights The mathematical solution to an optimization problem that solves for coefficients to be used as the weights pre-multiplying known values to predict values at locations with unknown values. The optimization problem is set up to mathematically guarantee that the weighted sum will have zero expected prediction error and the most precision (or lowest variance) for the prediction. Variogram A plot of the average value (vertical axis) calculated using Eq. (1) for unique pairs of points versus the average distance (horizontal axis) separating the pairs. The sample data are grouped within distance ranges (that is, ‘‘binned’’) for the purpose of estimating the variogram. Each bin of sample data yields one point for the variogram. See Fig. 8. Directional variogram A variogram estimated using sample points that fall along a specified direction. A tolerance on the direction, such as 11.25°, is set. The variogram is based on all pairs of points that fall within the direction gradient plus or minus the tolerance. See Fig. 7. Nugget effect Represents micro-scale variation or measurement error. It is estimated from the empirical variogram as the value from Eq. (1) when the separation distance between data points is zero. See Fig. 8. Sill The upper limit of the empirical variogram representing the variance of the characteristic being analyzed. See Fig. 8. Range The distance at which the characteristic being analyzed is no longer spatially correlated with values of the characteristic at other points. See Fig. 8.
Source Myers (1997), p. 344
Olea (1999), p. 188
Deutsch and Journel (1997), p. 64
Venables and Ripley (1997), p. 172
Myers (1997), p. 344 – 349
Kaluzny et al. (1997), p. 68
Kaluzny et al. (1997), p. 74
Kaluzny et al. (1997), p. 69
Kaluzny et al. (1997), p. 69 Kaluzny et al. (1997), p. 69
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
5
Table 1 (continued) Term
Definition
Realization
Values observed from a single outcome of a process that is stochastic. Because the process is stochastic or probabilistic, any additional outcomes of observed values (i.e., realizations) will differ from other observed outcomes and will fall within a range and at a frequency that are determined by the probability distributions that govern the process.
fore, the values may be spatially correlated relative to the distance separating the sample pairs. Thus, samples from areas where mining has occurred are valid for estimating coal sulfur content in remaining adjacent areas. However, when samples in mined areas are a long way away from areas of remaining coal (where we want to make predictions), the geostatistical procedure will tend to calculate a large standard error for
Source
the predicted sulfur contents at such distant locations, signaling low precision for the sulfur prediction. Another factor that guided the delineation of the area for analysis was consistency in geologic conditions. The southernmost extent of the coal in West Virginia may have formed under geologic conditions that were different than the conditions in the other parts of the coal bed. Therefore, because spatial continuity and
Fig. 1. Boxplots of the square-root-difference variogram for the sulfur point-sample data (modified from Eq. (1)). The open circles in the boxes are the mean values for the square root of the sulfur content difference; the horizontal lines are the median values for the square root of the sulfur content difference (1 mi = 1.609 km).
6 W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
Fig. 2. Location of sulfur samples used to analyze the sulfur content of the Pittsburgh coal bed.
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
geologic conditions are similar only in the large resource area A (Fig. 2), the estimation is not extended to the separate coal areas in Fig. 2.
3. Analysis The geostatistical methods of Cressie (1984, 1993) and Kaluzny et al. (1997) were used to predict average sulfur content and its standard error for the remaining coal in the Pittsburgh coal bed. Cressie analyzed coal ash data for a section of the Pittsburgh bed (Cressie, 1984) and later (Cressie, 1993) used the analysis of coal ash data to illustrate spatial data analytic techniques. Many of the same techniques are employed in this study. A regional-scale interpolator, block kriging (MathSoft, 2000), is the method used in this analysis to characterize the sulfur content of the remaining contiguous Pittsburgh coal bed (resource area A, Fig. 2). Block kriging uses any regional trends and local spatial correlation to develop estimates of coal sulfur content at a block level, and in the case of the current analysis, for cells 2.7 mi (4.3 km) on a side superimposed on resource area A. The estimates from block kriging are estimates of the average sulfur content (wt.%) and the standard error for the average, for each separate block as a whole. Block analysis was applied for several reasons. First, the sulfur-content sample data available to the analysis exhibited a high degree of clustering and did not have sufficient density to support meaningful analysis at mine scale. Blocks with areas of about 7 mi2 (19 km2) on the Pittsburgh coal bed usually contain enough coal to support several large coal mines operating for 15 –20 years. The estimates of average sulfur content and its standard error, at block scale, most likely would provide useful information for regional assessment. Indeed, the estimates of the standard errors were found to cover relatively narrow ranges compared to the ranges for point estimates and, thus, the block-scale estimators proved to be useful indicators of generalized regional trends in sulfur content for the remaining Pittsburgh coal bed. Finer scale analysis ordinarily would be undertaken by coal-mining companies and government agencies at the time specific mining operations are delineated.
7
3.1. Why it is important to account for systematic regional and local trends The spatial variation in the sulfur content of coal most likely reflects systematic variation at both regional and local scales. The regional variation can have its source in large-scale geologic processes that could have concentrated sulfur in the tops and bottoms of coal beds (Williams and Keith, 1979; Yancy and Faser, 1921), suggesting that thicker coal may have a lower sulfur content than thinner coal. For the Pittsburgh coal bed, this appears to be true, as there is a general trend of thicker, lower sulfur coal in the east, and thinner, higher sulfur coal in the west. This regional trend in sulfur values, which has been known for many years, requires the application of universal block kriging so that predictions of sulfur content are unbiased and have the smallest variance or most precision. Geostatistical analysis seeks to estimate weights which, when applied to sample values to predict sulfur content at unsampled locations, minimize mean square prediction error and produce unbiased predictions. For the prediction error to be minimum and predictions unbiased, two conditions must be satisfied. First, if a regional trend is present, the method of determining kriging weights has to account for the regional spatial trend, in which case, the procedure generally known as universal kriging, which incorporates the regional trend into the kriging model, should be applied (Kaluzny et al., 1997). The second condition is that the variances of the random variables (for example, sulfur content) across space should depend only upon the distance separating the realizations. The estimation of a variogram (essentially a plot of gamma (c) (obtained by plugging the sample data into Eq. (1)) against separation distance) is the method used to examine whether the sample data exhibit such a relationship. If (as in Fig. 7, as explained below) the directional variograms of coal sulfur content values (after their east – west trend has been removed) are similar in shape, start with a low value for c, and increase up to the variance of the sample data, this is a demonstration that the second condition for kriging estimates to be unbiased and minimum variance is met. The variogram (or plot of c from Eq. (1) versus separation distance) is the map of systematic spatial variation at a local scale. The east – west trend in
8
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
sulfur values for the Pittsburgh coal bed is our model of systematic variation at a regional scale. Both of these variation models are used to make controlled predictions of sulfur at unsampled locations, but random, unexplained variation remains. Thus, the predictions still have error or imprecision, and the quantification of the amount of imprecision in the predictions is of high interest and utility. Fortunately, because it is based upon mathematical statistics, universal block kriging has the capability to estimate the precision of the sulfur predictions and such estimates of precision are presented below in Section 4.
3.2. Accounting for a regional trend Exploratory data analysis is a recommended method to examine spatially located data for regional spatial trends. Using exploratory data analysis, Cressie (1993, p. 32) found an east – west trend in coal ash data for the Pittsburgh coal bed. Similar to Cressie’s findings, we have found an east – west trend in coal sulfur data for the Pittsburgh coal bed. In Cressie’s analysis, the coal ash data were spaced on a uniform grid. Consequently, Cressie was able to use row-byrow and column-by-column plots to uncover the east – west trend. A similar method is followed for our non-
Fig. 3. Sulfur content values in an east – west direction, conditioned on their north – south location. The lines drawn through the points are nonparametric local regression lines that indicate a generalized trend (1 mi = 1.609 km).
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
uniformly spaced data: examination of paneled sample data for east –west and north – south orientations. The east –west trends in coal sulfur content and ash for the Pittsburgh coal bed have been known for a long time (Cross, 1954). For this analysis, as well as in Cressie’s analysis, the examination of the data for a regional trend is undertaken so that the correct procedure can be identified for the geostatistical analysis. As discussed previously, if a regional trend exists, then the sample data need to be de-trended before kriging is applied. For implementation, it is necessary to have a statistical model of the regional trend, which is estimated within the universal kriging program by regressing coal sulfur content values against their east-to-west locations. The importance of the regional trend to this analysis is not the existence of the trend, which has been known for many years. Rather, it is the systematic statistical analysis of the trend, separately, from local spatial
9
variation, in order to satisfy statistical conditions needed to have unbiased and minimum-variance predictions of coal sulfur content. In Fig. 2, the large dot in the northeast corner is the origin point for determining the location of the sample points. From the origin point, the sample points span a distance of 100 mi (161 km) to the west and 120 mi (193 km) to the south. The top part of Fig. 3 shows that the sample data were split into six north-to-south (slightly overlapping) panels, each containing equal numbers of sulfur content sample points (Fig. 4). The bottom part of Fig. 3 contains plots for each panel, of sulfur content versus east-to-west distance, for all the respective points located in each panel. The plots for panels 1, 2, 3 and 4 indicate lower sulfur values in the east, and higher sulfur values in the west. Panels 5 and 6 are for the ‘‘peninsula’’-like area that occurs in the southern part of the resource block A (Fig. 4), where the narrow span of the coal deposit limits the distance
Fig. 4. Panels used (Fig. 3) to examine sulfur values for regional trends in an east-to-west direction. Each panel contains the same number of points. Panels overlap in order to see if a generalized regional trend is exhibited by the data (1 mi = 1.609 km).
10
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
11
Fig. 6. Sulfur content values in a north – south direction, conditioned on their east – west location. The lines drawn through the points are nonparametric local regression lines that indicate a generalized trend (1 mi = 1.609 km).
between sample values and precludes identification of possible regional trends. It should be noted that even though there is an indication of a regional trend, a lot of scatter still remains in the observations of coal sulfur content. A part of that variation will be accounted for by the local kriging model as applied (below) to the variation remaining after the regional trend is removed from the data, but even after that step, considerable uncertainty remains in predictions of point-located values. However, because the focus is on regional assessment,
the point predictions of coal sulfur content are not of interest to this analysis. Rather, we wish to be able to predict the coal sulfur content (and its statistical precision) for blocks of coal over an areal extent capable of supporting several large coal mines for a period of 20 years. Thus, we apply block kriging to estimate the average sulfur content and its standard error over a support area that is 2.7 mi (4.3 km) on a side. It turns out that the standard errors at block level are narrow enough to provide predictions that are informative at the block or regional level.
Fig. 5. Panels used (Fig. 6) to examine sulfur values for regional trends in a north-to-south direction. Each panel contains the same number of points. Panels overlap in order to see if a generalized regional trend is exhibited by the data (1 mi = 1.609 km).
12
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
Resource area A was split into six panels with a north-to-south orientation to examine the coal sulfur content sample data for the presence of a regional north-to-south trend (Fig. 5). There is no apparent north-to-south directional trend in sulfur values except for panel 6 (Fig. 6), which indicates a trend from lower sulfur values in the north to higher sulfur values in the south. Panel 6 includes the western-most part of block A (Fig. 5) of the Pittsburgh coal bed. Most of the sample data in panel 6 are located in areas where the coal has been mined. The area with remaining coal, which has few sample data points, occurs mainly to the south of most of the sample data points in panel 6. Consequently, estimates of sulfur for the remaining coal in the west will tend to have a high amount of uncertainty. The finding of an east-to-west trend in coal sulfur values (Fig. 3) supported the choice of universal
kriging as the appropriate geostatistical method (Cressie, 1993). Universal kriging accounted for the eastto-west trend by fitting a surface for coal sulfur content as a function of east-to-west location. These regionally de-trended data were then analyzed for local spatial correlation using kriging methods. 3.3. Examining local spatial correlation in different directions Directional empirical variograms, developed after the influence of an east – west trend is removed, are similar to each other indicating that spatial correlation is isotropic or the same, irrespective of direction. For example, Fig. 7 shows variograms (using Eq. (1)) plotted for sulfur content (after their east – west trend was removed) in four directions: north to south (0°), northeast to southwest (45°), east to west (90°), and
Fig. 7. Directional variograms after the east – west trend was removed from coal sulfur content sample values (1 mi = 1.609 km).
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
southeast to northwest (135°). The variograms all have the same appearance in a range of about 25 mi (40.2 km). The variogram along the 45° direction contains low sulfur values in the northeast section of the Pittsburgh coal bed separated by a long distance from low sulfur values in the Pittsburgh ‘‘peninsula’’. Because the Pittsburgh ‘‘peninsula’’ has many low sulfur points, as does the northeast section, gamma (c, calculated using Eq. (1)) is relatively low at high separation distances in that case. This outcome demonstrates that the variance across the sulfur values depends only upon the distance separating the values, and not upon the directions in which the values are arrayed. Thus, our data satisfy the second condition outlined in Section 3.1 and the predictions from the final kriging model will have minimum variance and be unbiased. 3.4. Estimation of an empirical variogram Local-scale variability of coal bed characteristics, such as thickness and quality, can equal or exceed regional variability. Nonetheless, as Fig. 7 demon-
13
strates, the sulfur content of the Pittsburgh coal bed exhibits a pattern of local spatial statistical correlation. Sulfur samples within close proximity to each other tend to be similar, thus, variance (or gamma) is small at short separation distances. As the geographic distance between samples increases, variance increases and statistical spatial correlation decreases. In the context of a kriging model, a local spatial trend is said to exist if values separated by short distance are more similar than values separated by long distances. A plot of Eq. (1) is the exploratory data method used to examine the sample data for such local spatial trends. To ensure that local trends are not confounded or masked by regional trends, the technique is to examine the local relationship along a direction gradient that does not exhibit a regional trend. In the previous section, it was found that along a north-to-south direction gradient, coal sulfur content does not exhibit a regional trend. Therefore, coal sulfur content sample values located in the north-to-south direction gradient were plugged into Eq. (1) to calculate c and create the empirical variogram (Fig. 8).
Fig. 8. Empirical variogram based upon 727 sulfur values for the Pittsburgh coal bed. The empirical variogram is developed from data pairs arrayed in a north – south direction, where there is no regional spatial trend. A spherical variogram model was fit to the empirical variogram (1 mi = 1.609 km).
14
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
Gamma (as calculated by Eq. (1) and plotted in Fig. 8) is a measure of similarity between sample values as influenced by the distance that separates the values. When the pair of data samples has coal sulfur content values that are very close in numerical value, c will be a relatively small number. The plotted value for c was found to be relatively small when a short distance separates the coal sulfur values (near the origin on the horizontal axis in Fig. 8). As the separation distance between sample values increases (moving to the right along the horizontal axis in Fig. 8), the computed value for c increases, indicating that the coal sulfur values in each pair are not as close in value to each other as they are when the distance separating the samples is shorter. When such a pattern is observed, the values are said to be spatially correlated. The empirical variogram (Fig. 8) shows a clear pattern of local spatial continuity. At short separation distances, the magnitudes of the coal sulfur contents in the sample are similar. As the distances separating the locations of data pairs increase, the degree of spatial continuity tapers off and disappears at about 25 mi (40.2 km). Above a separation distance of 25 mi (40.2 km), the variance in coal sulfur content for data pairs is practically equivalent to the variance for coal sulfur content calculated across all the samples (727) in the data set ( = 1.36% squared). 3.5. Model variogram to represent local spatial continuity in coal sulfur content Once it has been established that sample data exhibit local spatial continuity or correlation, then the next step is to fit a functional form to the data that make up the plot of c against separation distance. In the kriging literature, this step is referred to as ‘‘estimating the model variogram’’ (Kaluzny et al., 1997). The solution for kriging weights requires estimates of spatial correlation between sample points and prediction points. These estimates of spatial correlation are derived from the model variogram, which is a functional form fitted to the empirical variogram. Importantly, the model variogram has to have a form that guarantees a mathematical solution for the system of equations that determine kriging weights (Cressie, 1993). Only certain functional
forms have been found to have this property. The spherical variogram model used in this analysis is among the models that guarantee a solution (Cressie, 1993). In addition, the spherical model, rather than another model, was chosen, because it appears to fit the empirical variogram better than other models. The spherical variogram model fit to the empirical variogram has a range estimated to be 24.1 mi (38.8 km) (Fig. 8). Sulfur estimates separated by less than 24.1 mi (38.8 km) are spatially correlated. Spatial correlation of sulfur estimates is inversely related to gamma, and approaches zero as separation distance approaches 24.1 mi (38.8 km). In principle, at extremely close distances, sample sulfur values should be identical, and, therefore, variation between values should be equal to zero. When variance at zero distance is non-zero, as measured by the empirical variogram, it is termed the nugget effect (Fig. 8). The presence of a nugget effect indicates the presence of measurement error or unexplained short-scale variability in pyrite concentrations. The ‘‘sill’’ ( = 1.37% squared, Fig. 8) is approximately the variance of the sulfur content values, as estimated from the sample data. The difference between the sill and the nugget effect ( = 0.91% squared) is an indicator of the important role that spatial correlation plays in making estimates of sulfur content at other points. At extremely short distances, the spatial correlation reduces variability in coal sulfur content estimates by almost 0.91% squared. As separation distance approaches 24.1 mi (38.8 km), spatial correlation approaches zero and the variability of sulfur content values approaches 1.37% squared, or approximately the variance of the sample data. 3.6. Kriging model of sulfur in the Pittsburgh coal bed Regional trends and local spatial trends are combined to estimate a sulfur kriging model for the Pittsburgh coal bed. In this case, the statistically efficient procedure is universal kriging, which determines kriging weights that minimize mean square prediction error (Cressie, 1993, p. 173; Goldberger, 1962). Fig. 9, a wireframe diagram of the kriged sulfur surface, clearly shows that the model captures both regional and local variation in sulfur.
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22 Fig. 9. Kriged sulfur surface for the Pittsburgh coal bed. The wireframe surface floats above the sulfur content sample point values shown as triangles. The large square at the origin (0,0) corresponds to the large origin square in Fig. 2 (1 mi = 1.609 km).
15
16
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
4. Sulfur predictions The data available for our model were far below the level that would support detailed assessment. Therefore, it is only appropriate to estimate coal sulfur content at a large-area scale. We have applied universal block kriging (Isaaks and Srivastava, 1989; MathSoft, 2000) to estimate the average sulfur content for square areas, 2.7 mi (4.3 km) on a side. Watson et al. (2000) suggested that an area of this size has enough coal for several large-scale mines operating over a 20year period. The block estimates can be used as a basis for identifying areas where additional drilling could be targeted to refine sulfur estimates. Also, the block estimates and the procedures used to estimate sulfur by block were the basis for estimating a cumulative distribution of remaining Pittsburgh coal by sulfur content. As discussed in the Section 1, the estimates are not suitable for supporting inferences about sulfur content at a mine scale. 4.1. Sulfur predictions by coal block Results are summarized in Figs. 10– 12. Average sulfur values by block are the basis for the sulfur contours shown in Fig. 10. The sulfur contours are a prediction of the average sulfur content at block scale. Based upon a statistical model of point values, new cores drilled in a given block can have sulfur values that cover a range of around 2% S above and below predicted point values. In the case of the block estimates, high extreme and low extreme values are averaged with other more-numerous intermediate values to form a block average. Thus, block averages ordinarily will have prediction errors for the average sulfur value of the block that are narrower than prediction errors for point estimates. Fig. 11 is a map of the standard errors for the estimates of average sulfur content at the block level. The most precise estimates (those with the smallest standard error) occur mainly in blocks with a relatively large number of data samples. The blocks with the smallest standard error (0.11– 0.14% S), shown in dark blue, are located near the bottom of the Pittsburgh ‘‘peninsula’’, where there is a concentration of samples. The relatively high precision is due to the dense sampling pattern. In the northeast section of the coal, the blocks with an estimate of average sulfur
near 2% have an intermediate level of uncertainty; their standard errors range from 0.42% to 0.69% S. As an example of interpretation, a block with an estimated average of 2.2% S and a standard error of estimation of 0.5% S would have a 95% chance of having its true average sulfur content in the range between 1.2% and 3.2% S (assuming a normal distribution and a standard normal variate = 1.96). Fig. 12 combines the estimate of sulfur content with an estimate of coal tons to create a cumulative distribution of remaining tonnage by sulfur content and upper and lower confidence limits at the 90% level for average sulfur content, block by block. Blocks at every level of predicted average sulfur content have wide variation in estimation precision (Fig. 12). Fig. 11 could provide a basis for prioritizing areas for additional core drilling to refine estimates. For example, if it is thought that coal mining costs are relatively low for a particular block, and the block has relatively low average sulfur content (Fig. 10) but wide confidence limits for sulfur content (Fig. 11), then mining companies or land brokers may find it worthwhile to undertake additional sampling to finetune these preliminary regional-scale estimates of sulfur content. 4.2. Error bars for the cumulative sulfur distribution The average sulfur cumulative curve plotted in Fig. 12 is a single observed realization of a stochastic process. The sample data allow us to make inferences about the stochastic process that gave rise to these estimated values. Our model of that process is represented by an east – west trend and by the local variation captured in the spherical variogram model. The statistical model was used to simulate alternate realizations from the ‘‘known estimated’’ processes. This procedure has the advantage of retaining the interaction between trend and local variability. Additional realizations will generate values in a geographical pattern that will pair blocks (that have given amounts of coal) with a different ordered set of sulfur values, thereby generating different cumulative distributions for remaining Pittsburgh coal by sulfur content. A large set of these alternate realizations of the cumulative curve would provide an estimate of the mean, upper, and lower limits of the cumulative curve by average sulfur content.
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
Fig. 10. Contours of estimated average sulfur content (wt.% as-received basis) for blocks, 2.7 mi on a side (1 mi = 1.609 km). 17
18 W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22 Fig. 11. Uncertainty in the estimates of block average sulfur content. Blocks color-coded from lowest to highest in order of the estimated standard errors for the average sulfur content by block.
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
19
Fig. 12. Remaining coal in order by block average sulfur content, including upper and lower 90% confidence limits (1 short ton = 0.9072 metric ton).
Our procedure, similar to Isaaks and Srivastava (1989), p. 512), is to translate the sample data locations. The translation is designed to span two block widths around original locations, which is approximately equivalent to spanning by two standard normal variates at the block level. This procedure generates a
range that has about a 95% chance of covering the true distribution. Each set of new realizations is used to estimate average sulfur values at the centroids of the cells in the 2.7 mi (4.3 km) block grid. A new cumulative distribution of average sulfur content is estimated by ordering the sulfur values from low to
Fig. 13. Cumulative distribution of remaining Pittsburgh coal by average sulfur content, blocks 2.7 mi on a side, expected, upper 95% confidence level, and lower 95% confidence level cumulative distribution (1 mi = 1.609 km; 1 short ton = 0.9072 metric ton).
20 W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
Fig. 14. Comparison of sulfur (wt.%, as-received basis, washed) in coal shipments with sulfur content values (wt.%, as-received basis) predicted by the statistical model.
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
high and pairing the sulfur value with cumulative tonnage built up from the linked block tonnages. The outer hull of the distributions forms an estimate of the upper and lower confidence limits for the cumulative distribution of remaining Pittsburgh coal by sulfur content (Fig. 13). Because the simulations capture trend and local variability, the confidence range for the cumulative distribution is expected to be more narrow than the average of the confidence range of the individual block values illustrated in Fig. 12. Fig. 13 is useful for predicting the future availability of Pittsburgh coal by sulfur content. Assuming that the market will only accept coal with a sulfur content of 2.15% or less, and Pittsburgh coal is washed prior to delivery, raw coal up to about 2.75% would be marketable. Removal of approximately 0.6% S agrees with coal sulfur values reported for shipments of cleaned Pittsburgh coal (Resources Data International, 1998). Fig. 13 indicates that there are between 1.9 and 5.5 Gt (short tons) [1.7 and 5 Gt, respectively] still remaining at sulfur contents below 2.75%. The best estimate of availability is 3.5 Gt (short tons) [3.2 Gt]. The current rate of annual production from the Pittsburgh coal bed is around 80 Mt (short tons) [73 Mt], recovery rates are about 70%, wash recovery rates for the coal component are about 90%, and about 25% is unavailable due to technical, social, and environmental restrictions (Watson et al., 2000). Therefore, about 170 Mt (short tons [154 Mt] of resources are needed for coal deliveries of 80 Mt (short tons) [73 Mt]. At the current rate of production of 80 Mt (short tons) [73 Mt] per year, the remaining life for production from the Pittsburgh coal bed is bracketed between 11 and 32 years, with the best estimate being 21 years. In combination with similar cumulative curves for other eastern coal beds, questions concerning the availability of low sulfur coal in eastern coal fields could be addressed quantitatively. Although sulfur content is indicative of marketability, many other factors such as mining cost, transport cost, and growth of customer base also would have to be considered. Nonetheless, coal availability by sulfur content would be a key input to comprehensive analysis. New sample data should be added as it becomes available and the analysis repeated. Newer data would refine the results and, most likely, would narrow uncertainty ranges.
21
4.3. Comparison of model estimates with sulfur values in shipped Pittsburgh coal By combining data from EIA Form 7A (Energy Information Administration, 2000) and a commercial database of coal shipments (Resource Data International, 1998), shipment data were assembled from mines extracting coal only from the Pittsburgh bed from 1989 to 1997. The shipment data report the sulfur content of the coal as-received by power plants. Most of the coals were washed prior to shipment and, thus, the sulfur values for these data will be less than the raw coal values reported thus far in this paper. The data set has shipment records for 53 different mines. Because the average sulfur values for the shipments are representative of large mined blocks, they can be compared (on more or less equal statistical footing) with the estimates from our model. The shipment data are used to make a new set of contour lines showing sulfur content for the northern extent of the unmined Pittsburgh coal bed (Fig. 14). Bearing in mind that washing removes about 0.6% S, there is similarity between the red contour values for the sulfur content of shipped (washed) coal and the black contour values of raw coal as estimated by the statistical model (Fig. 14). For example, the 2% sulfur contour for shipped coal mimics the 2% sulfur contour for raw coal but encompasses a slightly larger area extending to the west. The larger area includes raw coal above 2% in sulfur cleaned down to the 2% level for delivery. The 2% shipped-sulfur contour arches around the Bailey and Enlow Fork mines, two of the largest producers of Pittsburgh coal. These two mines deliver washed Pittsburgh coal containing about 1.6% sulfur and appear to mine raw coal a little bit in excess of 2% sulfur. The data set for shipments contains significantly fewer observations than the data set used for statistical analysis. Therefore, the comparison is only indicative of consistency in the statistical model.
5. Conclusions The US Geological Survey is assessing regional coal resources and providing regional-scale information about coal tonnages and coal quality, including sulfur content. The data available for such assessments are limited. The survey’s interpretation inten-
22
W.D. Watson et al. / International Journal of Coal Geology 48 (2001) 1–22
tionally is focused on regional trends. Statistical methods offer a valuable tool to aid interpretation. In this research, geostatistical methods were applied to estimate the cumulative distribution of remaining coal by sulfur content for the Pittsburgh coal bed. This type of information, accompanied by a quantitative estimate of its uncertainty, can be an important input to public and private decision-making. Whenever possible, the predictions from kriging models should be compared against real data. Because there are many active mines on the Pittsburgh coal bed, it was possible to compile a data file of actual sulfur content values from mines located throughout the area of the Pittsburgh coal bed. We found good agreement between our modeled predictions of coal sulfur content and these actual values, indicating consistency in the geostatistical model.
References Cressie, N., 1984. Towards resistant geostatistics. In: Verly, G., David, M., Journel, A., Marechal, A. (Eds.), Geostatistics for Natural Resources Characterization, Part 1. Reidel, Dordrecht, pp. 21 – 44. Cressie, N., 1993. Statistics for Spatial Data. Wiley, New York, 900 pp. Cross, A.T., 1954. The geology of the Pittsburgh coal [Appalachian Basin]. Proceedings, Conference on the Origin and Constitution of Coal, June 1, 1952, (Nova Scotia Department of Mines), pp. 32 – 111. Reprinted: The geology of the Pittsburgh coal — stratigraphy, petrology, origin and composition, and geologic interpretation of mining problems. West Virginia Geol. and Econ. Surv., Rep. of Investigations 10, pp. 31 – 99. Deutsch, C.V., Journel, A.G., 1997. GSLIB: Geostatistical Software Library and User’s Guide. Oxford Univ. Press, New York, 380 pp. Energy Information Administration, 2000. Coal Industry Annual 1998. US Department of Energy, Washington, DC, DOE/EIA0584(98), 308 pp. Goldberger, A., 1962. Best linear unbiased prediction in the generalized linear regression model. J. Am. Stat. Assoc. 57, 369 – 375. Gomez, M., Hazen, K., 1970. Evaluating sulfur and ash distribution
in coal seams by statistical response surface regression analysis. U.S. Bureau of Mines Report of Investigations 7377, 120 pp. Hohn, M.E., Smith, C.J., Ashton, K.C., McColloch Jr., G.H., 1988. Mapping coal quality parameters for economic assessment (Abstract). AAPG Bull. 72, 965. Isaaks, E., Srivastava, R., 1989. An Introduction to Applied Geostatistics. Oxford Univ. Press, New York, 561 pp. Kaluzny, S., Vega, S., Cardoso, T., Shelly, A., 1997. S+ Spatial Stats User’s Manual for Windows and UNIX. Springer, New York, 327 pp. MathSoft, 2000. S + Spatial Stats Version 1.5 Supplement. MathSoft, Seattle, WA, 84 pp. Murphy, T.D., Brown, K.E., 1993. Combining geostatistics and simulation to predict sulfur at a central Illinois coal mine. Min. Eng. 45, 284 – 287. Myers, J.C., 1997. Geostatistical Error Management: Quantifying Uncertainty for Environmental Sampling and Mapping. Van Nostrand Reinhold, New York, 571 pp. Olea, R.A., 1999. Geostatistics for Engineers and Earth Scientists. Kluwer Academic Publishing, Boston, MA, 328 pp. Resource Data International, 1998. COALDAT Comprehensive. Resource Data International, Golden, CO, CD-ROM. Ruppert, L., Tewalt, S., Bragg, L., Wallack, R., 1999. A digital resource model of the Upper Pennsylvanian Pittsburgh coal bed, Monongahela Group, northern Appalachian basin coal region, USA. Int. J. Coal Geol. 41, 3 – 24. Tewalt, S.J., Ruppert, L.F., Bragg, L.J., Carlton, R.W., Brezinski, D., Wallack, R.N., and Butler, D.T., in press. A digital resource model of the upper Pennsylvanian Pittsburgh coal bed, Monongahela group, Northern Appalachian Basin Coal Region, USA. US Geological Survey Professional Paper, 1625C, Chap. C, CD-ROM. Venables, W.N., Ripley, B.D., 1997. Modern Applied Statistics with S-Plus. Springer-Verlag, New York, 548 pp. Watson, W., Ruppert, L., Tewalt, S., Bragg, L., 2000. The upper Pennsylvanian Pittsburgh coal bed: geology and mine models. Proceedings of the 2000 SME Annual Meeting, Salt Lake City, UT, CD-ROM. Preprint 00-26, Society for Mining, Metallurgy, and Exploration, Inc., Littleton, CO, 12 pp. Williams, E.G., Keith, M.L., 1979. Relationship between sulfur in coals and the occurrence of marine roof beds. In: Ferm, J.C., Horne, J.C., Weisenfluh, G.A., Staub, J.R. (Eds.), Carboniferous Depositional Environments in the Appalachian Region. University of South Carolina, Columbia, SC, pp. 102 – 109. Yancy, A.F., Faser, T., 1921. The Distribution of the Forms of Sulfur in the Coal Bed. University of Illinois Engineering Experiment Station Bulletin vol. 125, 94 pp.