Classification of Indian power coals using K-means clustering and Self Organizing Map neural network

Fuel 90 (2011) 339–347 Contents lists available at ScienceDirect Fuel journal homepage: www.elsevier.com/locate/fuel Classiﬁcation of Indian power ...

Download PDF

898KB Sizes 0 Downloads 17 Views

Report

PDF Reader
Full Text

Fuel 90 (2011) 339–347

Contents lists available at ScienceDirect

Fuel journal homepage: www.elsevier.com/locate/fuel

Classiﬁcation of Indian power coals using K-means clustering and Self Organizing Map neural network Yogesh P. Pandit a, Yogesh P. Badhe a,1, B.K. Sharma b, Sanjeev S. Tambe a,⇑, Bhaskar D. Kulkarni a a b

Chemical Engineering and Process Development Division, National Chemical Laboratory (NCL), Dr. Homi Bhabha Road, Pashan, Pune 411008, India Central Institute of Mining and Fuel Research (CIMFR), Dhanbad, Jharkhand 828108, India

a r t i c l e

i n f o

Article history: Received 23 July 2009 Received in revised form 4 September 2010 Accepted 9 September 2010 Available online 29 September 2010 Keywords: Coal classiﬁcation Indian power coals K-means clustering Self-Organizing Map

a b s t r a c t The present study reports results of the classiﬁcation of Indian coals used in thermal power stations across India. For classifying these power coals a classical unsupervised clustering technique, namely ‘‘K-Means Clustering” and an artiﬁcial intelligence (AI) based nonlinear clustering formalism known as ‘‘Self-Organizing Map (SOM)” have been used for the ﬁrst time. To conduct the said classiﬁcation, ﬁve coal descriptor variables namely moisture, ash, volatile matter, carbon and gross caloriﬁc value (GCV) have been used. The classiﬁcation results thereof indicate that Indian power coals from different geographical origins can be classiﬁed optimally into seven classes. It has also been found that the K-means and SOM based classiﬁcation results exhibit similarity in close to 75% coal samples. Further, K-means and SOM based seven coal categories have been compared with as many grades of a commonly employed Useful Heat Value (UHV) based Indian non-coking coal grading system. Here, it was observed that a number of UHV-based grades exhibit similarity with the categories identiﬁed by the K-means and SOM methods. The classiﬁcation of Indian power coals as provided here can be gainfully used in selecting applicationspeciﬁc coals as also in their grading and pricing. Ó 2010 Elsevier Ltd. All rights reserved.

1. Introduction Owing to its abundant availability, coal is the most important source of energy for the electric power generation in India utilizing in excess of 70% of its annual coal production in thermal power plants [1]. Close to three quarters of the installed 1,40,000 MW electricity generation capacity is produced from the coal-ﬁred plants in India. Apart from thermal power stations, the other major coal-consuming industries are steel, fertilizer, chemical, paper and cement. The better quality coal available in India is used by the metallurgical industry, such as steel plants with the power plants consuming the inferior quality coal. The ranges (%) of various constituents of the Indian coal are as follows [2]: carbon (38–60), volatile matter (1–36), water (3–43), silicon oxide (45–63), aluminum oxide (15–36), iron oxides (2–20), calcium oxide (trace-12), magnesium oxide (trace), ash (3–60), sulphur (0.3–8.3) and phosphorus (<0.5). Coal resources are found in 18 major coal-ﬁelds spread over India. According to an estimate [3,4], the total proven coal reserves of anthracite, bituminous, sub-bituminous and lignite coals in India are 92,447 million tonnes with the share of anthracite and

⇑ Corresponding author. Tel.: +91 020 2590 2156. E-mail address: [email protected] (S.S. Tambe). Present address: Persistent Systems Ltd., Analytics Competency Center, Pune 411 051, India. 1

0016-2361/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.fuel.2010.09.012

bituminous coals close to 97.5%. The annual coal production in the year 2003 in India, which ranks third highest globally, was 365.7 million tonnes (340.4 million tonnes of anthracite and bituminous coals and 25.3 million tonnes of sub-bituminous and lignite). Most of the Indian coals are non-coking type and available in many of its states viz., Bihar, Madhya Pradesh, Maharashtra, Andhra Pradesh, Orissa, Jharkhand, etc. Lignite deposits are available in Tamil Nadu, Kashmir, Rajasthan and Gujarat while tertiary deposits are a plenty in Assam and Jammu and Kashmir [5].

2. Coal classiﬁcation Classifying coal scientiﬁcally is signiﬁcantly important in techno-economic applications. There exists a number of coal classiﬁcation systems in use today and new schemes are still being introduced. Classiﬁcation of coals serves three major objectives namely, selection of coal for a speciﬁc industrial application, determination of coal’s grade or price for commercial purposes and quantity/constituents/property-based categorization for an assessment of the coal resource. Commonly, coals are classiﬁed according to their rank and the type. The rank of a coal describes the degree of the metamorphism undergone by it upon coaliﬁcation as it matures from peat to anthracite. The rank has an important bearing on coal’s physical and chemical properties. Anthracite is at the top of the rank scale and correspondingly has higher carbon and

340

Y.P. Pandit et al. / Fuel 90 (2011) 339–347

energy contents and a lower level of moisture. The low ranked coals such as lignites are browner and softer friable materials with a dull, earthy appearance. These have a high oxygen content (up to 30%), a relatively low carbon content (60–75% on a dry basis), and a high moisture content (30–70%). Low rank coals are typically used in Indian thermal power plants. Owing to their high moisture and low carbon percentages, the energy content of these coals is low. In another classiﬁcation method, coals are classiﬁed according to the organic debris, called ‘‘macerals” from which the coal is formed. Macerals are identiﬁed microscopically by reﬂected light wherein the reﬂective or translucent properties of a coal indicate its maceral type. A signiﬁcant amount of work has been done towards classifying coals for the purpose of grading/pricing [6] and for industrial use [7,8]. There also exists an Indian coal classiﬁcation scheme that is based on the Indian Standard IS: 770-1977 [9]. It represents a coal using a four digit code. For instance, non-coking coals are classiﬁed using the proximate analysis and caloriﬁc value. Inspite of being founded on some basic coal properties, the design of a four-digit code is time-consuming and tedious in industrial applications. Currently, the commonly employed coal grading system in India is based on the ‘‘useful heat value (UHV)” [10,11]. The UHV is computed on the basis of ash and moisture percentages as follows:

UHV ðkcal=kgÞ ¼ 8900 138ðash ð%Þ þ moisture ð%ÞÞ

ð1Þ

The UHV-based system classiﬁes non-coking Indian coals in seven grades (A–G). Grades A–C represent superior grades, while power coal is generally understood to represent grades D–G. The quality of power coal has deteriorated over the years and power plants mainly receive grades E, F and G containing high levels of ash (35–45%) and shale [12]. Conventionally, clustering methods are employed for classifying single/multi-variable data sets. From the literature survey it is found that a classical and widely employed clustering method namely ‘‘K-means clustering” and a relatively recent efﬁcient artiﬁcial intelligence (AI) based clustering method namely ‘‘SelfOrganizing Map (SOM)” are yet to be explored for classiﬁcation of Indian power coals. Accordingly, this paper reports the results of classiﬁcation of Indian power coals using K-means and SOM clustering formalisms. For classiﬁcation, this study uses ﬁve coal attributes namely moisture, volatile matter, carbon, ash and gross caloriﬁc value (GCV). The GCV is an important indicator of coal’s heating value and thus commonly used for selecting a coal for a speciﬁc industrial application. By performing rigorous classiﬁcation of power coals from different geographical origins in India, the present study attempts to map country’s major and industrially important natural resource in terms of its ﬁve important attributes. In what follows, a broad outline of the clustering is provided followed by the details of the K-means clustering and SOM neural network. Next, the classiﬁcation results from the K-means and SOM methods are compared and discussed.

[13–17]). These algorithms perform what is known as ‘‘supervised” clustering wherein they learn the known classiﬁcation from the training data at hand and extend the learned knowledge about the classes to the new data whose classiﬁcation is unknown. If the number of clusters present in a data set is unknown then ‘‘unsupervised” clustering methods are needed for classifying the data. These methods partition their input data space into K regions based on some similarity or dissimilarity metric. For achieving such a partitioning, a measure that computes a value reﬂecting the similarity between two input data patterns/vectors is needed. Most similarity metrics are sensitive to the range of values in the input vectors. To overcome this problem, the elements of the individual input vectors are normalized, for instance, within the unit interval [0, 1]. In addition to organizing and categorizing multidimensional data, clustering algorithms are also used in data compression and model construction. The objective of the present study involving classiﬁcation of Indian coals requires usage of un-supervised clustering techniques since the number of classes into which the corresponding data can be grouped as also the membership of each class are not known a priori. Accordingly, the well-known non-hierarchical clustering scheme termed ‘‘K-means method” is used to classify the Indian power coal data set. In the last two decades artiﬁcial neural networks (ANNs) have ﬁrmly established themselves as an artiﬁcial intelligence (AI) based popular tool to deal with large amounts of multi-variate data. The tasks for which ANNs have been found to be particularly effective are non-linear modeling (function approximation) and clustering/ classiﬁcation (see e.g., [18–20]). Among various types of ANNs, the ‘‘Self-Organizing Map (SOM) [21,22] has been found to be particularly well-suited for conducting unsupervised clustering. Thus, in addition to the K-means clustering the present study employs SOM neural network for classifying Indian power coals to arrive at a unique classiﬁcation scheme. 3.1. K-means clustering The K-means is a well-known non-hierarchical clustering method and requires the user to prespecify the number of clusters present in the dataset. When the number of speciﬁed clusters is too large, there may be clusters with no training data belonging to them. That is, some of the pre-speciﬁed clusters remain empty. There exists no objective method to a priori determine the number of clusters present in the data and therefore the requirement of pre-specifying the number of clusters is a major disadvantage of the K-means clustering. Notwithstanding these drawbacks, the technique can accommodate a large sample size. Since the number of clusters is usually unknown usage of an un-supervised clustering technique, such as SOM, which does not require the knowledge of the number of clusters becomes essential. The K-means algorithm partitions a given set of data in a manner such that the squared-error function is minimized for a pre-speciﬁed number of clusters. The squared error function (E) is deﬁned as:

3. Clustering methods Clustering techniques aim at obtaining an useful information by grouping or categorizing multi-dimensional data in clusters. Clustering in a d-dimensional Euclidean space, Rd, comprises partitioning a given data set of n elements into a number (K) of groups or clusters in such a manner that data points in the same cluster are in some sense similar and those belonging to different clusters are dissimilar in the same sense. The exact number of clusters required to group the data may or may not be known a priori. Several clustering algorithms are available in the literature when the number of clusters in a given data-set is known in advance (see e.g.,

E¼

K X X

kx zk k2

ð2Þ

K¼1 x2Sk

where K is number of speciﬁed clusters, the d-dimensional zk denotes the center of kth cluster and x represents a d-dimensional data vector belonging to the cluster Sk. The computational steps of K-means algorithm that aim to minimize the sum of squared distances between all points and the cluster centres are described below. Step 1: Choose K number of initial cluster centers, i.e., z1, z2,. . ., zk,. . ., zK where, Z k ðk 2 f1; 2; . . . ; KgÞ randomly from among the n

341

Y.P. Pandit et al. / Fuel 90 (2011) 339–347

input data vectors {X1, X2, . . ., Xn}being clustered, where Xi (i = 1,2, . . ., n) refers to a d-dimensional real-valued vector. Step 2: Assign a point, Xi (i = 1,2, . . ., n) to the kth cluster if:

2D grid of Neurons

kX i zk k kX i zp k; p ¼ 1; 2; . . . ; K and k–p: Step 3: Compute new cluster centers as follows

Z new ¼ i

1 X Xj; ni x 2s j

i ¼ 1; 2; . . . ; k; . . . ; K

Projected Space

ð3Þ

i

where ni is the number of data points assigned to the cluster Si. Step 4: If

kznew zi ke; i

i ¼ 1; 2; . . . ; k; . . . ; K; then terminate

ð4Þ

otherwise continue from step 2. If the above described procedure does not terminate at step 4 normally, then it is executed for a pre-speciﬁed maximum number of iterations. 3.2. Self-Organizing Map (SOM) neural network The Self-Organizing Map introduced by Kohonen [21,22] is suitable and efﬁcient for performing an unsupervised clustering. The SOM can project a high-dimensional input space onto a low dimensional topology so as to allow the number of data clusters to be visualized/determined by manual inspection. The SOM neural network owing to its advantages coupled with the unsupervised nature of its learning algorithm has been found to be an attractive alternative for solving classiﬁcation problems that traditionally have been the domain of conventional statistical and operations research techniques (see e.g., [23–26]). Chen et al. [27] have demonstrated that the SOM is a superior clustering technique and that its relative advantage over conventional techniques increases with higher levels of relative cluster dispersion in the data. Mangiameli et al. [28] showed that the SOM performed best when compared to seven other traditionally used hierarchical clustering methods. Self-Organizing Map is similar to the Principal Component Analysis (PCA) method that performs dimensionality reduction and classiﬁcation. The difference between the two approaches, however, is that the SOM performs a nonlinear lower dimensional mapping while PCA is a linear mapping technique. From the topology of patterns (samples) of a given data set, the SOM captures the nonlinear relationships existing between the pattern elements to create a low-dimensional image portraying the relationships that can be visualized conveniently. The SOM comprises an array of units (also known as ‘‘nodes” or ‘‘neurons”) arranged in the form of a grid (see Fig. 1). A d-dimensional weight (prototype) vector is associated with each node in the grid, where d refers to the dimensionality of an input data pattern (vector), An m-dimensional grid where m is smaller than the dimensionality d of the input data vectors (i.e., m < d) allows SOM to be used as a dimensionality reduction technique. The objective of SOM training algorithm executing dimensionality reduction is to obtain a suitable set of weight vectors such that it preserves the topology of the input space in the output (mapped) space. The algorithm trains the SOM iteratively in two stages namely rough and ﬁne-tuning stages. In each training iteration, a sample vector X is chosen from the input data set and the grid node that is nearest to X (also called ‘‘best matching unit”, BMU) is determined. The BMU is that unit on the grid whose weight vector is at the minimum distance (commonly evaluated using the Euclidean metric) from X. In the next step, the weight vector of the BMU and those of its grid neighbours are moved closer to the input vector X using the Kohonen learning rule. The result of such a reorganization is that similar weight vectors are brought closer to each other while leaving apart the dissimilar ones. Implementing this

Input Space x1,…, x2 Fig. 1. Schematic of Self-Organizing Map.

procedure iteratively over two training stages using a high (in the rough training stage) and a low (in ﬁne-tuning stage) value of the learning rate in the Kohonen rule forces the randomly initialized weight vectors to mimic the distribution of input data patterns in the output space. The above described SOM implementation maps the data points lying closer to each other in the input space, onto the neighbouring nodes on the map thus imbibing the ‘topology preservation’ property into the SOM. 3.2.1. SOM training algorithm Let Xi, i = 1,2, . . ., n, be the d-dimensional vectors to be clustered and Wij be the d-dimensional weight vector associated with the node at location ( i, j) of a 2-dimensional grid array (see Fig. 1). The stepwise procedure for training the SOM network is as given below. Step 1 (Initialization): Choose small random values for the initial ^ 0 Þ and the weights, Wij(0), and ﬁx the initial learning rate ða neighbourhood. Step 2 (Determining the BMU): Select a sample pattern, X, from the data set and determine the BMU (Cij) at training iteration t, using the minimum Euclidean distance criterion.

kX W C ij k ¼ min kX W ij k; ij

i ¼ 1; 2 . . . L;

j ¼ 1; 2 . . . L

ð5Þ

where |||| refers to the Euclidean norm and L denotes the number of rows (as also columns) in the square 2-D SOM grid. Step 3 (Weight updating): Update all the weights according to the Kohonen learning rule;

^ ðtÞkXðtÞ W ij ðtÞk if ði; jÞ 2 NCij ðtÞ W ij ðt þ 1Þ ¼ W ij ðtÞ þ a ¼ W ij ðtÞ otherwise

ð6Þ

where t denotes iteration index, NC ij (t) is the neighbourhood of the ^ ðtÞ ¼ a^0 =ð1 þ tÞ is the learning rate. BMU unit Cij at iteration t, and a Step 4: Increment the iteration index, t, by unity and decrease ^ ðtÞ, accordingly; shrink the the magnitude of the learning rate, a neighbourhood, N C ij (t) of the BMU.

342

Y.P. Pandit et al. / Fuel 90 (2011) 339–347

Table 1 Proximate and ultimate analysis data of Indian coals along with experimental GCV values and classiﬁcation results. Sample no.

Class (K-means)

Class (SOM)

Moisture

Ash

Volatile matter

Carbon

GCV (kJ/kg)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21a 22a 23a 24 25 26 27a 28a 29a 30a 31 32 33 34 35 36a 37a 38a 39a 40a 41 42 43 44 45 46a 47a 48a 49 50 51a 52 53 54 55 56 57a 58 59 60 61 62 63 64 65a 66 67 68 69 70 71 72 73a 74a

I I I I I I I I I I I I I I I I I I I II II II II II II II III III III III III III III III III VI III IV IV IV IV IV IV IV IV V V V V V V V V V V V V V V V V V V V VI VI VI VI VI VI VI VI V V

I I I I I I I I I I I I I I I I I I I II I I V II II II II II II II III III III III III III IV III III III IV IV IV IV IV IV IV IV V V VII V V V V V VII V V V V V V V V VI VI VI VI VI VI VI VI VI

7.5 8.8 8.3 7.3 7.5 6.9 7.3 8.4 7.6 8.4 8.2 8.0 6.5 8.3 7.1 7.1 9.9 7.2 8.2 9.5 7.2 9.0 7.4 8.5 9.1 7.4 6.7 7.1 6.2 6.3 4.6 5.6 6.6 7.2 5.4 5.5 5.1 4.4 5.5 4.8 5.2 5.7 5.4 4.3 5.0 7.6 8.1 8.9 10.0 6.0 6.3 5.4 6.0 6.0 6.1 6.8 5.7 5.5 5.8 6.8 4.6 6.4 5.6 7.0 4.3 3.6 6.0 7.1 3.8 7.2 6.7 7.8 5.8 5.3

34.0 31.8 32.2 33.4 30.6 34.3 33.1 26.8 32.4 31.1 30.8 32.4 33.4 32.5 34.4 33.5 28.2 34.4 31.7 17.6 17.4 21.4 22.1 18.9 18.2 24.3 24.8 29.0 28.1 26.2 29.0 26.9 25.3 28.9 32.6 18.7 29.3 35.6 35.5 35.5 36.7 36.6 30.0 33.3 31.3 38.1 35.5 35.9 34.5 38.0 37.9 40.7 38.8 42.8 36.4 43.2 37.5 40.1 41.1 41.8 39.7 39.3 42.4 40.7 25.5 23.5 15.0 21.5 18.4 16.7 17.1 17.1 44.7 43.2

26.2 24.6 25.0 25.1 25.4 24.0 25.8 26.4 22.5 21.9 22.0 24.4 26.2 27.0 27.7 25.5 27.5 25.3 27.3 29.1 30.0 26.3 29.2 29.7 25.6 27.1 26.8 28.1 27.4 24.6 23.3 24.3 26.6 26.0 25.9 25.5 27.3 27.6 28.7 28.4 28.2 27.7 29.4 26.7 28.9 24.5 24.5 23.2 24.7 26.3 24.1 23.9 23.9 24.0 24.9 22.2 23.2 24.2 23.9 25.0 25.5 22.5 26.1 24.4 28.4 29.8 26.9 24.7 25.0 27.8 26.2 24.9 22.5 22.5

45.77 43.8 45.0 43.5 46.6 44.1 44.9 48.4 45.42 46.5 47.3 45.6 45.7 43.48 46.09 46.01 48.51 45.34 46.3 57.67 58.75 54.6 54.8 55.1 57.5 53.6 51.9 50.4 50.95 54.1 53.3 53.0 52.7 48.2 49.1 61.1 51.8 45.52 45.38 46.14 43.3 42.64 48.84 48.6 48.43 41.2 41.5 42.0 41.23 41.85 42.5 40.4 41.6 37.8 43.9 36.6 43.1 42.0 40.23 39.21 40.2 41.1 38.82 39.27 56.0 57.7 62.5 57.8 61.9 61.2 62.2 60.5 36.99 37.8

4197 4319 4375 4194 4452 4199 4270 4620 4162 4435 4540 4316 4335 4104 4306 4353 4510 4282 4418 5369 5545 5151 5379 5202 5548 5096 4959 4975 4750 5051 5300 5157 5270 4975 4856 6065 4995 4327 4290 4360 4113 4071 4880 4695 4780 4008 3882 3907 3747 4089 3811 3725 3970 3565 4190 3555 4127 4011 3698 3603 3986 3839 3541 3759 5420 5635 6260 5585 6345 5965 6070 5681 3354 3628

343

Y.P. Pandit et al. / Fuel 90 (2011) 339–347 Table 1 (continued)

a

Sample no.

Class (K-means)

Class (SOM)

75 76 77 78 79

VII VII VII VII VII

VII VII VII VII VII

Moisture 5.7 4.9 5.7 0.8 5.1

Ash

Volatile matter

Carbon

GCV (kJ/kg)

47.8 60.8 46.3 45.7 46.5

22.3 17.4 22.0 15.7 20.8

32.9 23 34.8 39.1 35.6

2990 2104 3280 4100 3311

Indicates mismatch between K-means and SOM classiﬁcation.

^ ðtÞ and N C ij (t)), initial values of algorithm-speciﬁc parameters (i.e., a the weight vectors (Wij(0)), and the number of pre-speciﬁed maximum training iterations, ^t max ; these are commonly optimized using a heuristic procedure.

Fig. 2. No of classes and Davis–Bouldin index.

Step 5: Repeat steps 2–4 until the change in the weight magnitudes is less than the speciﬁed threshold or the maximum number of iterations ð^tmax Þ is reached. It should be emphasized that the success of SOM training depends critically on the judicious selection of the two main training

3.2.2. SOM visualization A visual inspection of the trained SOM can provide a useful insight into the density or the cluster structure of the input data as also correlations in the data. The two widely used methods for gaining such an insight are described below. The Uniﬁed Distance Matrix (UDM) provides an important information in the form of distances between nodes of the SOM grid. In this method, a matrix of distances (known as ‘‘U-matrix”) between the d-dimensional weight vectors of neighbouring nodes of the two-dimensional SOM is computed. The U-matrix distances can be used to unravel the structure of the data clusters present in the data set under investigation. The density of the weight vectors is illustrative of the density of the input data patterns. Accordingly, the UDM measuring distances between the weight vectors is indicative of the said density and a suitable representation such as greylevel or colour imaging can be devised to interpret the distances between two neighbouring grid nodes. A lighter (darker) shade of grey between two nodes of the map indicates a smaller (larger) inter-node distance. Accordingly, a lighter region enclosed by a dark shaded boundary indicates presence of a cluster of data points. It may be noted that in many cases data do not contain well-deﬁned

Fig. 3. U-matrix plot with data points.

344

Y.P. Pandit et al. / Fuel 90 (2011) 339–347

clusters. In such cases cluster boundaries need to be ascertained via manual inspection and judgement. The other method to interpret the SOM is Component Plane Representation (CPR), which visualizes relative component values of its weight vectors. The CPR can be considered as a ‘‘sliced” version of the SOM, where each plane shows the distribution of one speciﬁc component of the weight vectors. It allows qualitative unravelling of the inter-dependencies and similarities existing among different variables of the clustered data set. 4. Collection of coal data The data set used in the classiﬁcation of Indian power coals comprises constituents of the proximate and ultimate analyses as also the corresponding experimentally determined gross caloriﬁc values (GCVs) (kcal/kg) of 79 coal samples (see Table 1). All the samples were analyzed by the Central Institute of Mining and Fuel Research (CIMFR), Dhanbad, India, and sourced from six prominent coal producing regions of India supplying coals to the thermal power stations. Speciﬁcally, the data set comprises values (determined using ‘‘as received” basis) of ﬁve major constituents of the coal analysis namely, moisture (%) (x1), ash (%) (x2), volatile matter (%) (x3), carbon (%) (x4) and gross caloriﬁc value (GCV) (x5). 5. Results and discussion In the ﬁrst set of simulations, the coal dataset in Table 1 was subjected to unsupervised clustering using the K-means technique. In this clustering, the Davis–Bouldin (DB) index was used for determining the optimal number of clusters present in the data set. The DB index is a function of the ratio of the sum of within-cluster variance to between-cluster-centre distances and it is computed as:

DB ¼

k 1X ej þ ei ; max K i¼1 i–j dij

j ¼ 1; 2; . . . ; K

ð7Þ

where ei is the average Euclidean distance of vectors in the ith cluster to the center of the ith cluster and dij is the distance between centers that characterise clusters i and j. While implementing Kmeans clustering, multiple simulations were performed by varying the number of user-speciﬁed clusters, K. Since K-means algorithm is sensitive to initialization of cluster centres Cj, the procedure was executed multiple times (25) for each pre-speciﬁed K value using a different set of data points as initial cluster centres. The best of these runs was selected on the basis of minimum magnitude of the DB index. Fig. 2 shows the DB index magnitudes as a function of the number of speciﬁed clusters. As noticed in Fig. 2, the DB index magnitude is lowest for cluster number (K) equal to seven, thus indicating the presence of seven clusters in the ﬁve-dimensional coal dataset. The cluster to which each coal sample belongs as identiﬁed by the K-means method is listed in the second column of Table 1. In the second set of simulations, the Indian power coal dataset was subjected to the SOM-based classiﬁcation using the SOM Toolbox [29]. The optimum size of the two-dimensional SOM grid was selected by training the SOM with different grid sizes and prespeciﬁed number of training iterations. The optimum grid size obtained thereby contains an array of [40 40] nodes. Here, the SOM algorithm was run for 20,000 training iterations (10,000 iterations each in the rough and ﬁne training phases). The radius of learning in the rough training phase was 20 while the radius in the ﬁne training phase was 0.01. The results of the SOM-based classiﬁcation are portrayed in the form of a U-matrix plot in Fig. 3(a) and panel 3(b) shows the colour (gray) scale to interpret the distances between the neighbouring units (i.e., weight vectors) of the SOM

Fig. 4. SOM grid with class boundaries and indexed data points (clusters are indexed in Roman).

345

46.5–66.5 38.3–67.1

Table 3 Useful heat value (UHV) based grading of Indian non-coking coals.

18.6–33.3 18.6–52.5

2104–4100 2104–4100 5420–6345 3354–6345

23–39.1 23.0–42.5 56.0–62.5 37.0–62.5

15.7–22.3 15.7–24.1 24.7–29.8 22.5–29.8

45.7–60.8 37.5–60.8 15.0–25.5 15.0–44.7

Class-VII Class-VI

3.6–7.8 3.6–7.8

0.8–5.7 0.8–6.3

Y.P. Pandit et al. / Fuel 90 (2011) 339–347

41.0–54.7 26.4–53.2 34.3–42.4 34.3–47.0 29.4–39.8 23.1–42.8 24.6–33.8 24.5–38.5 K-means SOM Ash (%) + moisture (%) 6

33.3–44.3 23.9–44.3

3354–4190 3541–5420 4071–4880 3882–4880 4750–5300 4292–6065 5096–5545 5051–5548 K-means SOM GCV (kJ/kg) 5.

4104–4620 4104–5151

36.6–43.9 36.6–56.0 42.64–48.8 42.64–48.8 48.2–54.1 45.4–61.1 43.5–48.5 43.5–58.8 K-means SOM Carbon 4.

53.6–58.8 50.4–57.5

22.2–26.3 22.2–29.2 26.7–29.4 23.2–29.4 23.3–28.1 23.3–28.7 21.9–29.2 21.9–30.0 K-means SOM Volatile matter 3.

25.6–30.0 25.6–29.7

36.4–44.7 22.1–43.2 30.0–36.7 30.0–38.1 24.8–32.6 18.7–35.6 26.8–34.4 17.4–34.4 K-means SOM 2.

Ash

17.4–24.3 17.4–29.0

Class-V Class-IV

4.3–5.7 4.3–8.9 4.6–7.2 4.4–7.2

Class-III Class-II

7.2–9.5 7.10–9.5

Class-I

6.5–9.9 6.5–9.9

Clustering method

1.

K-means SOM

Attribute

Moisture

Sr. no

Table 2 Class-wise data ranges of ﬁve coal descriptors and Ash + Moisture combination as identiﬁed by K-means and SOM clustering.

4.6–10.0 4.3–10.0

a

Grade

Useful heat value (UHV)a (kcal/kg)

Ash (%) + moisture (%) at 60% RH and 40 °C

Gross caloriﬁc value (GCV) (kcal/kg) range

A B C D E F G

>6200 5600–6200 5600–6200 4200–5600 3360–4200 2400–3360 1300–2400

619.5 19.6–23.8 23.9–28.6 28.7–34.0 34.1–40.0 40.1–47.0 47.1–55.0

>6454 6049–6454 5597–6049 5089–5597 4324–5089 3865–4324 3113–3865

UHV = 8900–138.0 (ash (%) + moisture (%)).

grid. In this ﬁgure, the actual data points are also plotted as darkcolored hexagons. In the U-matrix plot, a dark coloured node indicates that its weight vector is at a higher distance from those of the adjoining light coloured nodes. Although several white regions are seen in the ﬁgure, it is difﬁcult to unambiguously identify a ﬁxed number of clusters owing to the absence of a clearly discernible dark-shaded continuous boundary around each cluster. The absence of clearly identiﬁable clusters and boundaries thereof has its origin in the scatter that exists in the values of the database undergoing classiﬁcation. It is however possible to take assistance from the optimal clustering performed by the K-means method for ﬁxing the number of clusters in the SOM. According to the K-means method, the coal data can be optimally grouped in seven clusters. This knowledge was used to identify seven clusters in SOM. Although a tedious task, cluster boundaries were ascertained by manually identifying a series of dark shaded adjacent SOM neurons each one located between two lighter neurons. These boundaries passing through the dark shaded neurons and separating seven coal categories are shown in the U-matrix plot in Fig. 4(a). The corresponding SOM-based classiﬁcation of all the 79 power coal samples is listed in column three of Table 1. Columns two and three of Table 1 compare the sample-wise classiﬁcation of coals by K-means and SOM methods, respectively. It is noticed that 59 of the 79 coal samples have been classiﬁed identically by both the methods (75% agreement). Additionally, a table listing the class-wise ranges of ﬁve attributes has been prepared (see Table 2). As can be noticed in this table, in a number of cases the K-means and SOM-based ranges match closely. The differences in the ranges have arose owing to the 20 samples that have been classiﬁed differently by the K-means and SOM. In Fig. 4 it is seen that eight samples numbered 21, 22, 28, 29, 46, 48, 73 and 74 are located on the borders of the SOM grid. These samples are among those 20 samples that are classiﬁed differently by the K-means and SOM methods. The misclassiﬁcation of the stated eight samples by the SOM neural network is most probably due to a limitation known as the ‘boundary effect’. This effect is responsible for the undue inﬂuence of the initial random weights assigned to the network nodes, which can lead to an incorrect topological representation [30]. From the ranges of ﬁve attributes listed in Table 2 it is observed that: Coals belonging to classes II and VI are of higher rank (due to their high GCV and carbon content and lower ash content) when compared with the coals in the remaining ﬁve classes. The coals belonging to classes I and IV possess nearly similar K-means ranges of ash, carbon and GCV; however they possess substantially varying ranges of moisture and volatile matter. Among all classes, the coals belonging to class-VII are of poorest quality owing to their lowest GCV and carbon contents and high ash percentage.

346

Y.P. Pandit et al. / Fuel 90 (2011) 339–347

Each class identiﬁed by the K-means and SOM is unique since there are no two or more classes possessing equivalent ranges of all the ﬁve attributes. Using the ranges of the ﬁve coal attributes given in Table 2, it is possible to determine the class membership of a new Indian power coal sample. The constituents of a coal and their magnitudes determine its suitability for a speciﬁc industrial usage. Accordingly, coals belonging to classes I, V and VII possessing low GCV and high ash content can also be utilized in cement and brick industries requiring slow heating. Coals from classes II, III and VI are characterized by high GCV and thus are suitable for power generation via combustion and gasiﬁcation routes although high ash content is a drawback of these coals. Coals belonging to classes I and VII when mixed with biomass are good candidates for co-gasiﬁcation in various industries. For additional comparison, the Indian non-coking coal categories from the UHV-based classiﬁcation/grading system [10,11] were considered. Here, it is important to note that owing to their usage of ﬁve descriptors, the K-means and SOM based classiﬁcation is more broad-based as compared to the UHV-based grading utilizing only two coal descriptors viz. ash and moisture. Table 3 lists the UHV-based seven grades (A to G) of non-coking coals as a function of ash plus moisture percentages and the corresponding UHV and GCV ranges. The UHV-based grades have been compared with the K-means and SOM based classiﬁcation by calculating the ranges of ash + moisture percentages in respect of the seven classes identiﬁed by the stated two methods. These class-wise ranges are listed in the last two rows of Table 2. A comparison of UHV-based and K-means and SOM based classiﬁcation reveals the following. A category equivalent of UHV-based grade ‘‘A” involving low (619.5%) ash plus moisture percentage and high GCV magnitudes (>6454) does not exist in both K-means and SOM based classiﬁcation. The reason for the stated absence of class A coals in the said classiﬁcation is that these are high quality coals and

in India power stations commonly utilize inferior quality coals, which are exclusively included in the data base used in the Kmeans and SOM based classiﬁcation. Grade D in the UHV-based classiﬁcation (ash (%) + moisture (%) range: 28.7–34.0 and GCV range: 5089–5587 kcal/kg) matches nearly with the class-II of K-means based (ash (%) + moisture (%) range: 24.6–33.8 and GCV range: 5089–5545 kcal/kg) and SOM-based (ash (%) + moisture (%) range: 24.5–38.5 and GCV range: 5051–5548 kcal/kg) classiﬁcation. Grade E non-coking coals in the UHV-based classiﬁcation (ash (%) + moisture (%) range: 34.1–40.0 and GCV range: 4224– 5089 kcal/kg) are reasonably similar to those of class-IV of K-means based (ash (%) + moisture (%) range: 34.3–42.4 and GCV range: 4071–4880 kcal/kg) and SOM based (ash (%) + moisture (%) range: 34.3–47.0 and GCV range: 3882–4880 kcal/kg) classiﬁcation. Grade G in the UHV-based classiﬁcation (ash (%) + moisture (%) range: 47.1–55.0 and GCV range: 3113–3865 kcal/kg) is a subset of category VII in the K-means and SOM based classiﬁcation (ash (%) + moisture (%) range: 44.0–66.6 and GCV range: 2104– 4100 kcal/kg). In addition to the U-matrix, Individual Component Planes (ICP) were obtained using the SOM Toolbox [29] with an aim to study the interdependencies of the ﬁve coal attributes used in the classiﬁcation. These planes are the plots of individual attributes of the weight vectors associated with each SOM node. In ICP, the values of each attribute are represented using a color code or a gray scale. Fig. 5 shows the ﬁve component planes (panels (a)–(e)) corresponding to the ﬁve coal attributes namely, moisture, ash, volatile matter, carbon and gross caloriﬁc value. The color code used in representing the values of individual attributes is also shown as a side bar in panels 5(a)–(e). An examination of the component planes reveals that the planes for carbon and GCV are very similar thus indicating a strong correlation between the two attributes. It is well-known that the GCV of a coal

Fig. 5. Individual component planes corresponding to ﬁve coal attributes.

Y.P. Pandit et al. / Fuel 90 (2011) 339–347

is strongly dependent on its carbon content. In panels, 5(d) and (e), it is clearly seen that a low (high) carbon content has resulted in a low (high) GCV thus supporting the prior knowledge. It is also observed that the component planes in Fig. 5 do not exhibit easily discernible correlations except between the carbon content and GCV.

6. Conclusions This study for the ﬁrst time reports results of classiﬁcation of Indian coals used in thermal power stations via a classical unsupervised clustering method namely K-means clustering and an artiﬁcial intelligence based formalism known as Self-Organizing Map. The said classiﬁcation was conducted on the basis of ﬁve coal attributes namely moisture, ash, volatile matter, carbon content and gross caloriﬁc value. The classiﬁcation results thereof indicate that Indian power coals from different geographical origins can be classiﬁed into seven classes. It has been also observed that the K-means and SOM based classiﬁcation exhibits similarity in close to 75% coal samples. Additionally, K-means and SOM based seven coal classes have been compared with as many grades of a commonly utilized Useful Heat Value (UHV) based Indian non-coking coal grading system. Here, it was observed that a number of UHV-based grades exhibit similarity with the classes identiﬁed by the K-means and SOM methods. The classiﬁcation of Indian power coals as also the class-wise ranges of the ﬁve coal attributes provided in this study can be gainfully used for selecting application-speciﬁc coals and their pricing. Also, the classiﬁcation methodology exempliﬁed here can be extended to other fuels such as crude oil.

References [1] EIA. India: Environmental Issues. US Energy Information Administration; February 2004. . [2] Mishra UC. Environmental impact of coal industry and thermal power plants in India. J Environ Radioactiv 2004;72:35–40. [3] EIA. Data obtained from International Energy Annual 2003. US Energy Information Administration, Washington, DC; 2005a, table posted June 13, 2005. (production). [4] EIA. Data obtained from International Energy Annual 2003. US Energy Information Administration, Washington, DC; 2005b, table posted June 13, 2005. (reserves). [5] Majumdar N, Saran R. A viewpoint in coking coal classiﬁcation. Fuel Sci Technol 1987;6:119–24.

347

[6] Tumuluri SG, Srikhande KY, Rao SK, Haque R. Studies on classifying Indian coals, Part 1I. A new system for grading and pricing. Fuel Sci Technol 1985;4:105–14. [7] Mukherji AK, Chatterjee CN, Ghose S. Coal resources of India – its formation, distribution and utilization. Fuel Sci Technol 1982;1:19–34. [8] Tumuluri SG, Srikhande KY. Studies on classifying Indian coals Part I. A new classiﬁcation system. Fuel Sci Technol 1985;4:57–60. [9] Indian Standard IS 770:1977. Classiﬁcation and coaliﬁcation of Indian coals and lignites (second revision); 1978. [10] Gazett Notiﬁcation. Grading and pricing of coal. Government of India, Ministry of Energy, New Delhi, India; January 1984. [11] Chaudhury A, Biswas S. Central Fuel Research Institute (CFRI) internal report. Development of equivalency chart between UHV and GCV. Report no. TR/ CFRI?3.02/2002-03; 2003. [12] Mathur R, Chand S, Texuka T. Optimal use of coal for power generation in India. Energy Policy 2003;31:319–31. [13] Arabie P, Hubert LJ, De Soete G, editors. Clustering and classiﬁcation, River Edge. Singapore: World Scientiﬁc Publishing; 1996. [14] Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM Comput Surv 1999;31(3):316–23. [15] Omran MGH, Engelbrecht AP, Salman A. An overview of clustering methods. Intell Data Anal 2007;11(16):583–605. [16] Kothari R, Pitts D. On ﬁnding the number of clusters. Pattern Recognit Lett 1999;20:405–16. [17] Maulik U, Bandyopadhyay S. Genetic algorithm based clustering technique. Pattern Recognit Lett 2000;33:455–1465. [18] Tambe SS, Kulkarni BD, Deshpande PB. Elements of artiﬁcial neural networks with selected applications in chemical engineering and chemical and biological sciences. Louisville, KY, USA: Simulation and Advanced Controls Inc; 1996. [19] Kirew DB, Chretien JR, Bernard P, Ros F. Application of Kohonen neural networks in classiﬁcation of biologically active compounds. SAR QSAR Environ Res 1998;8(1-2):93–107. [20] Zupan J, Gasteiger J. Neural networks in chemistry and drug design an introduction. Weinheim: Wiley-VCH; 1999. [21] Kohonen T. The self-organizing map. Proc Inst Electr Electron Eng 1990;78(9):1464–80. [22] Kohonen T. Self-organizing maps. Berlin, Lecce, S.A.: Springer; 2000. [23] Orwig RE, Chen H, Nunamaker JF. A graphical self organizing approach to classifying electronic meeting output. J Am Soc Inf Sci 1997;48(2):157–70. [24] Tokutaka H, Yoshihara K, Fujimura K, Iwamoto K, Obu-Cann K. Application of self-organizing maps (SOM) to auger electron spectroscopy. Surf Interface Anal 1999;27:783–8. [25] Michaelides SC, Pattichis CS, Kleovoulou G. Classiﬁcation of rainfall variability by using artiﬁcial neural networks. Int J Climatol 2001;21:1401–14. [26] Tennant WT, Hewitson BC. Intra-seasonal rainfall characteristics and their importance to the seasonal prediction problem. Int J Climatol 2002;22:1033–48. [27] Chen SK, Mangimeli P, West D. The comparative ability of Self-organizing neural networks to deﬁne cluster structure. Omega Int J Manage Sci 1995;23(3):271–9. [28] Mangiameli P, Chen SK, West D. A comparison of SOM neural network and hierarchical clustering methods. Eur J Oper Res 1996;93:402–17. [29] Vasanto J, Himberg J, Alhoniemi E, Kiviluoto K, Parhankangas J. Self-organizing map in matlab: the SOM Toolbox. Helsinki: Helsinki University of Technology; 2000. . [30] Kiang MY, Kulkarni UR, St. Louis R. Circular/wrap-around self-organizing map networks: an empirical study in clustering and classiﬁcation. J Oper Res Soc 2004;52:93–101.

Classification of Indian power coals using K-means clustering and Self Organizing Map neural network

Classification of Indian power coals using K-means clustering and Self Organizing Map neural network

Recommend Documents