Clustering algorithms for the design of a cellular manufacturing system - an analysis for their performance

Clustering algorithms for the design of a cellular manufacturing system - an analysis for their performance

Computers ind. Engng Vol. 19, Nos 1-4, pp. 432--436, 1990 0360-8352/90 $3.00 + 0.00 Copyright © 1990 Pergamon Press plc Printed in Great Britain. Al...

398KB Sizes 2 Downloads 115 Views

Computers ind. Engng Vol. 19, Nos 1-4, pp. 432--436, 1990

0360-8352/90 $3.00 + 0.00 Copyright © 1990 Pergamon Press plc

Printed in Great Britain. All rights reserved

C~USTERING ALGORITHMS FOR THE DESIGN OF A CELLULAR MANUFACTURING SYSTEM - A N ANAlySIS FOR THEIR PERFORMANCE

Tarun Gupta Assistant Professor Department of Industrial Engineering Western Michigan University Kalamazoo, MI 49008 and Hamid S e i f o d d i n i Assistant Professor Industrlal & System Engineering Department University of Wisconsin Milwaukee, WI 53201

ABSTRACT

This paper presents results from an analytical study performed to determine the severity of chaining problem besides other performance characteristics associated with the clustering process of four selected algorithms. The four algorithms were Single linkage clustering (SLINK), Average linkage clustering (ALINK), Weighted average linkage clustering (WLINK), and Complete linkage clustering (CLINK). A sample of fifty problems with randomly generated data sets was used to determine feasible solutions consisting of machine cells and corresponding part families from each of the four algorithms. A quantitative measure is proposed for evaluating the performance of different algorithms. The study concludes that the chaining effect for CLINK, WLINK, ALINK and SLINK progressivly worsens when the use of clustering algorithm is changed from CLINK to SLINK in the same order. INTRODUCTION

Over the last two decades, group technology (GT) has emerged as an important scientific principle [Kusiak, 1987, Gupta et al. 1989] in improving the productivity of manufacturing systems. The application of GT does not depend upon the degree of automation, and may be applied in a totally automated system or even in a manual production system [Choobineh, 1988]. Cellular manufacturing - based on the philosophy of GT, recognizes that small to medium sized batches of large variety of parts can be produced in a flow line manner. This requires identification of groups of machines which can produce part types with similar processing requirements. The process of formation of machine groups and corresponding part families is known as machine-component grouping and is the first step in the design of cellular manufacturing systems (CMSs). Several approaches have been developed to identify part families and their associated machine cells. These approaches can be classified in two groups as follows: - approaches

that are based on part characteristics,

- approaches

that are based on production methods.

and

The part-oriented techniques analyze parts for their similarities in design features and functionalities and therefore, usually do not influence 432

Gupta and Seifoddini: Cellular Manufacturing System

433

directly the configuration of manufacturing cells [Choobineh, 1988]. The other approach to machine cells design is based on manufacturing data such as, production methods, routing information and process plans [Gupta & Seifoddini 1989]. This approach is reported to be more effective than the part-oriented approach for designing a CMS. One of the most commonly used method in this category is based on the process of hierarchical clustering. The procedure is heuristic in nature and requires two major components - an association measure to define relationships between entities and a clustering method to identify clusters. The analyst faces a perplexing problem when he/she is forced to choose an association measure and clustering method for the analysis [Anderberg, 1973]. Generally speaking, every association measure or clustering method is different from every other one. One serious problem with most clustering methods is the problem of chaining. SLINK has also been characterized for chaining or string effect among groups resulting in one very large group and several very small groups. Till date, no comprehensive study has been reported to identify factors that may be associated with the problem of chaining with clustering process. To quantify the relative severity of chaining an objective criterion is also proposed. HEURISTIC CLUSTERING

One of the most intuitively appealing approaches used for seeking naturally occurring clusters is heuristic clustering [Mosier, 1989]. It consists of two major components which guide the process clustering: a.

b.

a measure for defining the relationship between entities (e.g., machines) and an algorithm for computationally converting the metric to measure the relationship between clusters (e.g., cells).

Since McAuley's [1972] work in heuristic clustering, several researchers have undertaken studies in this area [DeWitte 1980, Waghodekar & Sahu 1984, Vakharia & Wemmerlov 1986, Seifoddini & Wolfe 1986, Gupta & Seifoddini 1989]. Several measures of relationship between entities have been proposed in these studies. Most of these measures are based on the idea of similarity coefficient first proposed by Jaccard and employed by McAuley [1972]. Details of this method are referenced by Mosier [1989]. Gupta & Seifoddini [1989] proposed a production data based similarity coefficient and developed a heuristic procedure. The procedure employs CLINK as the clustering algorithm and adopts an objective criterion to evaluate performance of clusters that are identified as a result of clustering process. Their study also demonstrated the appropriateness of the new similarity coefficient over existing ones. However, the choice of CLINK as clustering algorithm was largely subjective. One of the serious problem associated with clustering algorithms and has been frequently reported is the problem of chaining. Yet, the relative impact of various algorithms is not fully known. As mentioned earlier, the primary objective of this research was to analyze chaining problem among four most often used clustering algorithms namely, SLINK, ALINK, WLINK and CLINK. The process of chaining is introduced in the subsequent section. Clusterinu Aluorlthms

Single Linkage Clustering

(SLINK)

McAuley employed the well known SLINK for clustering of machines and components. SLINK defines the similarity relating any two clusters as the maximum similarity for one or more machine pairs. A machine pair essentially consists of one machine from each of the two member clusters. In other words, the measure of association between two clusters K and K' is defined as: SK'K'

= max (S..,] j~K,j~K,

(i)

The measure of similarity could be one of the several possibilities such as distance based, common parts processing and so on. When clustering is finished, results are often presented in the form of dendogram and user selects a solution among alternate solutions that optimizes a chosen criterion such as total material handling cost, number of intercellular trips etc.

434

Proceedings of the 12th Annual Conference on Computers & Industrial Engineering

ComDlete

Linkaae C l u s t e r i n a

(CLINK)

The c o m p l e t e - l i n k a g e m e t h o d (CLINK) is related closely to SLINK. A good source of in-depth d i s c u s s i o n is p r o v i d e d in R o m e s b u r g [1984]. CLINK defines the s i m i l a r i t y b e t w e e n two clusters K and K' as follows: SK,K, = min {Si i,) jeK,jt~K'

(2)

Thus, CLINK defines the similarity of two clusters to be that of the least similar entities w i t h i n them. Again, the m e a s u r e of s i m i l a r i t y could be one of the several p o s s i b i l i t i e s as explained above. Average

Linkage C l u s t e r i n g

(ALINK)

Seifoddini [1987] p r o p o s e d the use of average linkage (ALINK) and d e m o n s t r a t e d how it can result in an improved m a c h i n e - c o m p o n e n t grouping solution over a solution obtained by using SLINK. In SLINK, each cluster is c h a r a c t e r i z e d by the highest similarity value b e t w e e n any two machines of parent clusters; in CLINK, each cluster is c h a r a c t e r i z e d by the highest similarity value that exists between every m a c h i n e pair of parent cluster. Instead of extreme values A L I N K c h a r a c t e r i z e s a new cluster by an average of similarity c o e f f i c i e n t of all m a c h i n e pairs in the two parent clusters. A pair c o n s t i t u t e s of a m a c h i n e from each of the two m e m b e r clusters. Thus, ALINK defines the s i m i l a r i t y between two clusters K and K' as follows: (3)

SK,K, = A v e r a g e {S j } jeK,j'EK' " '

It may be noted here that A L I N K defined by earlier researchers Seifoddini [1987] and A n d e r b e r g [1973] was somewhat d i f f e r e n t p r i m a r i l y for computational simplicity. The principal d i f f e r e n c e b e t w e e n our definition and the earlier d e f i n i t i o n is that we have ignored the sums of w i t h i n group pairwise similarities. weiahted Averaae

Linkaae C l u s t e r i n a

(WLINK}

W e i g h t e d A v e r a g e Linkage (WLINK) m e t h o d is an e x t e n s i o n of ALINK. The average s i m i l a r i t y c o e f f i c i e n t of A L I N K is w e i g h t e d over by the sizes of two member clusters. Thus, WLINK defines s i m i l a r i t y b e t w e e n two clusters K and K' as follows: SK,K,

=

( N k * S K + N k, * SK, )/(N k + Nk, )

(4)

Besides c l u s t e r i n g method, size of a p r o b l e m and the p r o b l e m data were expected to influence the chaining process. To study the effect of problem size alone, the effect of problem data must be confounded. This was ensured by g e n e r a t i n g the data sets for h y p o t h e t i c a l p r o b l e m s using simulation. The simulation p r o g r a m w r i t t e n in FORTRAN 77 employed a series of random number generators.

Chaining To explain the chaining phenomena, a c l u s t e r i n g m e t h o d say, SLINK is considered. Let the m e a s u r e of a s s o c i a t i o n between entities be a correlation like m e a s u r e such as similarity coefficient. Thus, SLINK results in clusters that are formed by joining the single strongest link between two member clusters at each stage. Two clusters qualify to m e r g e and form a new cluster if one or more m a c h i n e pairs possess the largest similarity. As explained earlier, m a c h i n e pairs are formed with one m a c h i n e from each of the two member clusters. R e m a i n i n g pairs may be highly d i s s i m i l a r resulting in p r o g r e s s i v e loss of h o m o g e n e i t y within the cluster. This aspect tends to make the c l u s t e r i n g method inefficient in d e l i n e a t i n g poorly separated clusters and u s u a l l y result in one large cluster. The t e n d e n c y to form long 'serpentine' like cluster is called "chaining"; this p r o p e r t y is often criticized b e c a u s e entities at opposite ends of a cluster may be m a r k e d l y d i s s i m i l a r [Anderberg 1973]. Several researchers have reported effects of this p r o b l e m however, little work appears to have been u n d e r t a k e n to identify its causes.

Gupta and Seifoddini: Cellular Manufacturing System

435

The m e t h o d o l o g y adopted in this study is discussed in the next section. The important results are presented in the subsequent section followed by a section identifying possible scope for future w o r k and some conclusions. METHODOLOGY

A sample of fifty hypothetical problems was studied. The choice of sample size was largely subjective. As mentioned earlier, these problems were generated using a simulation program. The program employed several random number generators to create necessary data files such as binary matrix file, routing file and production volume file. The binary matrix file consisted of machine vector data. The routing file contained information such as partwise operation sequence and unit operation time for each operation whereas, part wise production volumes were contained in the production volume file. A production data based similarity coefficient was used as a measure of association between machines [Gupta & Seifoddini 1989]. Each problem was solved using all four clustering algorithms. Each time M - 1 solutions are generated, where M is the number of machines in the problem. To study the effect of problem data on chaining during clustering, the problem size was considered to be an important parameter. Two dimensions namely number of machines and the number of parts that form a grouping problem determine its size. The clustering methods studied in this research group machines and parts concurrently, therefore product term of the two dimensions was defined as a measure of problem size. A total of (M - i) iterations are required for an M machines problem. Each iteration forms one new cell which is a result of merger of two machines or two groups of machines. Therefore, the number of iterations and also the number of solutions are determined by factor M, the value of which is varying from problem to problem. Results of clustering process were studied for four predetermined levels of cells formation. Each of the four levels was a fixed percentage of total iteration (15%, 35%, 60% and 85%). The identification of clustering levels as a fixed percentage of total number of iterations was necessary to normalize for M. The number of machines computed as percentage of M in the largest and the smallest cell are used as indicators of severity of chaining. Some solutions were expected to result in more than one largest and smallest cells. Thus, percentage of total machines involved in these cells were also determined for each of the four stages of clustering. Each problem was analyzed for grouping solutions using similarity coefficient method based on production data. Since the total number of iterations and hence the alternative machine cells solutions depend upon the problem size, therefore, four iteration levels, each identified as a fixed percentage of total iterations for a problem were selected. Number of cells and cells size for grouping solutions at predetermined levels of clustering were recorded for each problem, cell sizes and counts of largest and smallest cells were also tabulated. In order to normalize for M, the values were represented is percentages of M. IMPORTANT

RESULTS

A paired t-test was performed for several pairs of variables. All tests were performed at 5% significance level. First the analysis was performed to study the relative chaining problem of four clustering techniques using all fifty problem sets. The number of machines in the largest and the smallest cells for various clustering stages were noted for each problem. It was observed that the number of machines as a fraction of total machines in the largest cells using ALINK was significantly lower than the one obtained when SLINK was used. Occasionally, more than one largest and smallest cells were found. Similar results were found for p a i r s W L I N K and ALINK, CLINK and WLINK. Statistically, WLINK and CLINK resulted in significantly smaller proportion of machines in the largest cell/cells than their counterparts namely, ALINK and WLINK respectively. Thus, on the average, there was a significant increase in proportion of total machines in the largest and the smallest cells progressively among CLINK-WLINK-ALINK-SLINK. When a similar comparison using t-test was performed for number of machines per cell as a percentage of total machines in the largest cell as well as the smallest cell, interesting results were obtained. The difference CAIE 19/I-4--CC

436

Proceedings of the 12th Annual Conference on Computers & Industrial Engineering

in means of number of machines per largest cells for pairs SLINK and ALINK, ALINK and CLINK, SLINK and CLINK, SLINK and WLINK, and WLINK and CLINK were statistically significant. First member in each pair resulted in larger number of machines. The number of machines per cell in the smallest cell were significantly different for pairs ALINK and SLINK, CLINK and ALINK, CLINK and SLINK, WLINK and SLINK, and CLINK and WLINK. This however, resulted in smaller cells for the first member in each pair. Clearly, SLINK results significantly bigger largest as well as smallest cells than CLINK. Thus, it may be concluded here that SLINK suffers more severely due to chaining than CLINK. Therefore, choice of clustering method can influence chaining problem. CONCLUSIONS

Several researchers have reported the problem of chaining with hierarchical clustering algorithms. SLINK is the most often employed algorithm and the associated problem of chaining is known for over two decades. However, no comprehensive study has been reported which analyzes the behavior of various clustering algorithms. Their effect on machinecomponent grouping process has therefore, remained largely unknown. Seifoddini [1987] concluded that ALINK may in some cases improve quality of a grouping solution obtained from SLINK. This study has determined relative severity of chaining among cells and machines due to four important algorithms. For a meaningful comparison, four levels of clustering were identified. Each level was defined as a fixed percentage of maximum number of iterations for a problem. The study revealed contrary to the intuition that CLINK has the most favorable effect on the grouping process. The extent of chaining remains significantly less severe throughout the clustering process when CLINK was used instead of SLINK. REFERENCES

[i] Anderberg, York, 1973.

M.R.,

Cluster AnalYsis

for ADDlications,

[2] Ballakur, A. and Steudel, H.J., "A Within-cell Heuristic for Designing Cellular Manufacturing Systems," Res., vol. 25, no. 5, 1987, pp. 639-665. [3] Burbidge, J.L., "Production to Group Technoloqy, Rirneihill

Academic

Press, New

Utilization Based Int. Jour. of Prod.

Flow Analysis," Seminar on the First Steps Institute, East Kilbridge, Glasgow, 1970.

[4] Choobineh, F. "A framework for the design of cellular manufacturing systems", Int. Jour. of Prod. Res., vol. 26, No. 7, 1988, 1161-1172. [5] De Witte, "The Analysis," Int. Jour.

Use of Similarity Coefficient of Prod. Res., 18, 1980, 503.

in

Production

Flow

[6] Kusiak, A. and Chow, W.S., "Efficient solving of the group technology problem" Journal of Manufacturinu Systems, No. 6, pp. 117-124, 1987. [7] Gupta, T. and Seifoddini, H. "Production data based similarity coefficient method for the design of cellular manufacturing system" IIE International Conference at Toronto. Canada. May 1989.