A fuzzy-neural and multiple-bucket approach for estimating lot cycle time in a wafer fab with dynamic product mix

A fuzzy-neural and multiple-bucket approach for estimating lot cycle time in a wafer fab with dynamic product mix

Available online at www.sciencedirect.com Computers & Industrial Engineering 55 (2008) 423–438 www.elsevier.com/locate/caie A fuzzy-neural and multi...

225KB Sizes 3 Downloads 62 Views

Available online at www.sciencedirect.com

Computers & Industrial Engineering 55 (2008) 423–438 www.elsevier.com/locate/caie

A fuzzy-neural and multiple-bucket approach for estimating lot cycle time in a wafer fab with dynamic product mix Toly Chen Department of Industrial Engineering and Systems Management, Feng Chia University, 100, Wenhwa Road, Seatwen, Taichung City 407, Taiwan Received 4 October 2007; received in revised form 8 January 2008; accepted 8 January 2008 Available online 15 January 2008

Abstract A fuzzy-neural and multiple-bucket approach is proposed in this study for lot cycle time estimation in a wafer fab with dynamic product mix, which was seldom thoroughly investigated in the past studies. The proposed methodology is composed of two parts. In the first part, the multiple-bucket approach is applied to consider the future release plan of the wafer fab. Subsequently, the FCM-FBPN approach is applied to estimate the cycle time of every lot in the wafer fab. The buckets obtained in the first part become additional inputs to the FBPN. In this way, the fluctuation in the product mix since the release of a wafer lot can be considered in estimating the cycle time of the wafer lot. According to experimental results, the estimation accuracy of the proposed methodology was significantly better than those of many existing approaches. Other findings include that a large number of buckets were beneficial to the estimation accuracy, and might not worsen the efficiency of the proposed methodology. Ó 2008 Elsevier Ltd. All rights reserved. Keywords: Fuzzy back propagation network; Fuzzy c-means; Dynamic product mix; Bucket; Cycle time; Wafer fab

1. Introduction Estimating the cycle time for every lot in a wafer fabrication plant (wafer fab) is a critical task not only to the fab itself, but also to its customers. After the cycle time of each lot in a wafer fab is accurately estimated, several managerial goals (including internal due-date assignment, output projection, ordering decision support, enhancing customer relationship, and guiding subsequent operations) can be simultaneously achieved (Chen, 2003). However, estimating lot cycle time in a wafer fab is not easy at all because the wafer fab is a very complicated production system. Typical characteristics of a wafer fab include: fluctuating demand, lots with various product types and different priorities, un-balanced capacity, lots’ reentrance to the bottleneck machines, hundreds of processing steps, etc. Chen (2003) classified the current approaches to estimate the cycle time of a wafer lot into six categories: multiple-factor linear combination (MFLC), production simulation (PS), back propagation networks (BPN), E-mail address: [email protected] 0360-8352/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.cie.2008.01.004

424

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

case based reasoning (CBR), fuzzy modeling methods, and hybrid approaches. Among the six approaches, MFLC is the easiest, quickest, and most prevalent in practical applications. The major disadvantage of MFLC is the lack of estimation accuracy (Chen, 2003). Conversely, huge amount of data and lengthy simulation time are two disadvantages of PS. Nevertheless, PS is the most accurate cycle time estimation approach if the simulation model has high validity and the assumptions and conditions in constructing the simulation model are still valid during the fabrication process of a wafer lot. The latter requirement is nearly impossible to satisfy. Considering both effectiveness and efficiency, Chang and Hsieh (2003) and Chang et al. (2005) both estimated the cycle time of a wafer lot with a BPN having a single hidden layer. Compared with MFLC approaches, the average estimation accuracy measured with root mean squared error (RMSE) was considerably improved with these BPNs. For example, an improvement of about 40% in the RMSE was achieved in Chang, Hsieh, and Liao (2005). On the other hand, much less time and fewer data are required to generate a cycle time estimate with a BPN than with PS. Chang, Hsieh, and Liao (2001) proposed a k-nearest-neighbors based case based reasoning (CBR) approach which outperformed the BPN approach in estimation accuracy. In one case, the advantage was up to 27%. Chang et al. (2005) modified the first step (i.e. partitioning the range of each input variable into several fuzzy intervals) of the fuzzy modeling method proposed by Wang and Mendel (1992), called the WM method, with a simple genetic algorithm (GA) and proposed the evolving fuzzy rule (EFR) approach to estimate the cycle time of a wafer lot. Their EFR approach outperformed CBR and BPN in estimation accuracy. Chen (2003) constructed a fuzzy BPN (FBPN) that incorporated expert opinions in forming inputs to the FBPN. Chen’s FBPN was a hybrid approach (fuzzy modeling and BPN) and surpassed the crisp BPN especially in the efficiency respect. Another hybrid approach was proposed in Chang and Liao (2006) by combining SOM and WM, in which a lot was classified using a SOM before estimating the cycle time of the lot with WM. Similarly, Chen (2006) proposed the hybrid SOM-BPN approach in which a lot was also pre-classified with a SOM before estimating the lot’s cycle time with a BPN. As a result, lots belonging to different categories were learned with different BPNs (but with the same topology), and only the BPN of a lot’s category was applied to estimate the lot’s cycle time. After constructing several fuzzy inference rules, Chen’s SOM-BPN was applied to assess the probability of keeping the internal due-date based on the estimated cycle time in Chen (2008). To enhance the efficiency, Chen (2007a) constructed a k-means (kM)-FBPN system, in which a wafer lot was pre-classified by kM before estimating the cycle time of the lot by a FBPN. For embodying the uncertainty of wafer lot classification, Chen (2007b) used fuzzy c-means (FCM) instead. The results of these studies showed that pre-classifying wafer lots was beneficial to estimation accuracy. Except few studies in which the historical data of a real wafer fab were collected, most studies in this field used simulated data. The product mix in a wafer fab includes the percentage of lots with every product type in the wafer fab, which is usually dynamic and should be considered in estimating the cycle time of every wafer lot. The reasons are explained as follows. The product mix in a wafer fab continuously changes as lots are released into or outputted from the wafer fab, and is therefore sensitive to the release pattern. If from now only lots of product types requiring large capacity are released into the fab, then the cycle time of every lot currently in the fab will be lengthened. Conversely, if from now only lots of product types requiring small capacity are released into the fab instead, then every lot currently in the fab can be outputted from the fab more quickly. These phenomena are especially evident when the wafer fab is operating with the uniform-release policy, and also hold if the release policy is aimed at maintaining a constant level of work-in-progress (CONWIP). However, in most studies using simulated data, the assumption of a fixed product mix was made, while in studies adopting the real data, the effect of the product mix was not thoroughly investigated. To investigate that, the future release plan of the wafer fab has to be known in advance, so as to observe the fluctuation in the product mix since the release of a wafer lot, and to consider it in estimating the cycle time of the wafer lot. For this purpose, in this study a fuzzy-neural and multiple-bucket approach is proposed. Here, a ‘‘bucket” means the capacity that is required by the lots of a product type being released within a future time interval and is discounted in some way. The idea originates from the look-ahead function proposed in Chen (2007a). However, Chen (2007a) assumed a fixed product mix, and considered only the three-bucket case. In this study, the future release plan of a wafer fab is converted into the multiple buckets of each product type, so as to be considered in estimating the cycle time of every lot being released into the wafer fab. Subsequently, the FCMFBPN approach is applied to fulfill the task. Namely, a wafer lot is firstly pre-classified with FCM. Instead of the other classifiers, FCM is applied in this study to allow for the flexibility in classifying wafer lots. Then,

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

425

the FBPN of the category the wafer lot belongs to is applied to estimate the cycle time of the wafer lot. The reason for applying a FBPN is that it usually converges more quickly than a BPN does. The buckets of a product type become the additional inputs to the FBPNs of the product type. PS is also applied in this study to generate test data. Using simulated data, the effectiveness of the proposed methodology is shown and compared with those of many existing approaches. The remainder of this paper is organized as follows. Section 2 is divided into two parts. First one introduces the multiple-bucket approach for considering the future release plan of a wafer fab. Then the FCM-FBPN approach for estimating the cycle time of a wafer lot is described in the second part. To evaluate the effectiveness of the proposed methodology, PS is applied in Section 3 to generate some test data. Based on analysis results, some discussion points are made in Section 4. Finally, the concluding remarks and some directions for future research are given in Section 5. 2. Methodology Variables that are used in the proposed methodology are defined: (1) Rn: the release time of lot n. (2) LSn: the lot size of lot n. ~ n is derived by multiplying the importance of Un that is expressed (3) Un: the average fab utilization at Rn. U with a fuzzy value to Un. ~ n is derived by multiplying the impor(4) Qn: the total queue length on the processing route of lot n at Rn. Q tance of Qn that is expressed with a fuzzy value to Qn.  (5) BQn: the total queue length before bottlenecks at Rn. BQn is derived by multiplying the importance of BQn that is expressed with a fuzzy value to BQn.  (6) FQn: the total queue length in the whole fab at Rn. FQ is derived by multiplying the importance of FQn that is expressed with a fuzzy value FQn.  (7) WIPn: the fab work-in-progress (WIP) at Rn. WIP n is derived by multiplying the importance of WIPn that is expressed with a fuzzy value to WIPn. ~ ðiÞ (8) DðiÞ n : the delay of the ith recently completed lot, i = 1  3. Dn is derived by multiplying the importance of ðiÞ ðiÞ Dn that is expressed with a fuzzy value to Dn . ðjÞ ~ ðjÞ (9) BðjÞ n : the jth bucket of lot n, j = 1  m. Bn is derived by multiplying the importance of Bn that is ðjÞ expressed with a fuzzy value to Bn . (10) CTn: the cycle time of lot n. (11) (+), (), (): fuzzy addition, subtraction, and multiplication, respectively. For simplifying the calculation, all fuzzy-valued parameters will be given in triangular fuzzy numbers. A ~ ¼ ða; b; cÞ. The membership function is triangular fuzzy number can be defined by a triplet A  ðX  aÞ=ðb  aÞ a 6 X 6 b; ð1Þ lA~ ðX Þ ¼ if ðX  cÞ=ðb  cÞ b 6 X 6 c: The fuzzy arithmetic for triangular fuzzy numbers is applied to deal with all calculations involved in training the FBPN: Addition: ða1 ; b1 ; c1 ÞðþÞða2 ; b2 ; c2 Þ ¼ ða1 þ a2 ; b1 þ b2 ; c1 þ c2 Þ;

ð2Þ

(a1, b1, c1) and (a2, b2, c2) are two triangular fuzzy numbers. Subtraction: ða1 ; b1 ; c1 ÞðÞða2 ; b2 ; c2 Þ ¼ ða1  c2 ; b1  b2 ; c1  a2 Þ:

ð3Þ

Scalar multiplication: u  ða; b; cÞ ¼ ðminðua; ucÞ; ub; maxðua; ucÞÞ;

ð4Þ

426

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

u is a constant. Multiplication: ða1 ; b1 ; c1 ÞðÞða2 ; b2 ; c2 Þ ffi ðminða1 a2 ; a1 c2 ; c1 a2 ; c1 c2 Þ;

b1 b2 ;

maxða1 a2 ; a1 c2 ; c1 a2 ; c1 c2 ÞÞ:

ð5Þ

An example is given in Fig. 1 to demonstrate the approximation of the fuzzy multiplication of two triangular fuzzy numbers. Division: ða1 ; b1 ; c1 Þð=Þða2 ; b2 ; c2 Þ ffi ðminða1 =a2 ; a1 =c2 ; c1 =a2 ; c1 =c2 Þ; maxða1 =a2 ; a1 =c2 ; c1 =a2 ; c1 =c2 ÞÞ;

b1 =b2 ; ð6Þ

a2 P 0:

Exponential function: eða;b;cÞ ffi ðea ; eb ; ec Þ:

ð7Þ

2.1. The multiple-bucket approach for considering the future release plan There are many possible ways to incorporate the future release plan of a wafer fab in estimating the cycle time of a lot being released into the wafer fab. Here, the multiple-bucket approach is proposed for this purpose. A bucket of a lot is a discounted workload on the processing route of the lot within a future time interval (according to the future release plan). Chen (2007a) introduced the three-bucket case in which the discounted workload is calculated by summing the total processing times divided by the release times (see Fig. 2). The considerations in designing such a formula include: (1) Future lots with smaller release times, i.e. nearer future lots, are more influential. (2) Future lots with larger total processing times, i.e. heavier workloads, are more influential. The three buckets then become additional inputs to the FBPN. After such a treatment, the accuracy of estimating the cycle time of a wafer lot was improved by 4.8% on average in Chen (2007a). To further enhance the performance, the three-bucket case can be generalized into the m-bucket case (see Fig. 3). A demonstrative example using the simulated data is shown in Fig. 4 to compare the results associated with several values of m. According to the 10-bucket case, some heavy workload will be released into the fab after 350 h from now, which will considerably lengthen the cycle times of the lots that are of the same product type and are still in the wafer fab at that time. Such an observation cannot be made if the 3-bucket case is adopted instead. Though a large value of m is favored, it does prolong the learning time of the FBPN because of the increase in the number of inputs.

μX Approximation

A B

1

A(×)B

0.8 0.6 0.4 0.2

X

0 1

4

7

10

28

Fig. 1. Approximation of the fuzzy multiplication of two triangular fuzzy numbers.

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

427

time now + T1 + T2 + T3 now + T1 + T2

now + T1

now

discounted

(ignored)

time now + T1 + T2 + T3 now + T1 + T2

now + T1

now

summed

1st bucket 3rd bucket 2nd bucket

(ignored)

time now + T1 + T2 + T3 now + T1 + T2

now + T1

now

Fig. 2. The three-bucket case.

1st bucket

2nd bucket m-th bucket

…..

(ignored)

time now + T1 + … + Tm

now + T1 + T2 now + T1

now

1.2 1 10-bucket

0.8

7-bucket 5-bucket

0.6

3-bucket

0.4 0.2 0 10

9

8

7

6

5

4

time (100 hours)

Fig. 4. A demonstrative example.

3

2

1

0

discounted future workload

Fig. 3. The m-bucket case.

428

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

2.2. The FCM-FBPN approach for estimating the cycle time Wafer lots (examples) are pre-classified into K categories with FCM before they are fed into the FBPNs. FCM performs classification by minimizing the following objective function: Min

K X n X k¼1

lmiðkÞ e2iðkÞ

ð8Þ

i¼1

where K is the required number of categories; n is the number of examples; li(k) represents the membership of example i belonging to category k; ei(k) measures the distance from example i to the centroid of category k; m 2 (1, 1) is a parameter to increase or decrease the fuzziness. The procedure of applying FCM to classify examples is as follows: (1) Establish an initial classification result. (2) (Iterations) Obtain the centroid of each category as xðkÞ ¼ fxðkÞj g; xðkÞj ¼

n X

,

lmiðkÞ xij

ð9Þ n X

i¼1

liðkÞ ¼ 1=

lmiðkÞ ;

ð10Þ

i¼1

K X

2=ðm1Þ

ðeiðkÞ =eiðlÞ Þ

ð11Þ

;

l¼1

eiðkÞ ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X 2 ðxij  xðkÞj Þ ; all j

ð12Þ

ð3Þ where xij indicates the jth parameter in {Un, Qn, BQn, FQn, WIPn, Dnð1Þ , Dð2Þ xðkÞ is the centroid n , Dn } of example i;  of category k. Note that the buckets of an example are not considered in classifying the example.

(3) Re-measure the distance of each example to the centroid of every category, and then recalculate the corresponding membership. (4) Stop if the following condition is satisfied. Otherwise, return to step (2): ðtÞ

ðt1Þ

max max j liðkÞ  liðkÞ j< d; k

ð13Þ

i

ðtÞ

where liðkÞ is the membership of example i belonging to category k after the tth iteration; d is a real number representing the threshold of membership convergence. Finally, the separate distance test (S test) proposed by Xie and Beni (1991) can be applied to determine the optimal number of categories K: ð14Þ

Min S s:t: Jm ¼

K X n X k¼1

e2min S¼

ð15Þ

i¼1

¼ min p6¼q

lmiðkÞ e2iðkÞ ;

X

! ðxðpÞj  xðqÞj Þ

2

;

ð16Þ

all j

Jm ; n  e2min

K 2 Zþ: The K value minimizing S determines the optimal number of categories.

ð17Þ

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

429

After classification, examples of different categories are then learned with different FBPNs but with the same topology. The procedure for determining the parameter values of the FBPN is described as follows. The configuration of the FBPN is established as follows: (1) Inputs: 9 + m parameters associated with the nth example/lot including LSn, Un, Qn, BQn, FQn, WIPn, ð2Þ ð3Þ ð1Þ ðmÞ Dð1Þ n , Dn , Dn , and Bn  Bn . These parameters have to be normalized so that their values fall within [0, 1]. Then some production execution/control experts are requested to express their beliefs (in linguistic terms) about the importance of each input parameter in estimating the cycle time of a wafer lot. Linguistic assessments for an input parameter are converted into several pre-specified fuzzy numbers. The subjective importance of an input parameter is then obtained by averaging the corresponding fuzzy numbers of the linguistic replies for the input parameter by all experts. The subjective importance obtained for an input parameter is multiplied to the normalized value of the input parameter. After such a treatment, all inputs to the FBPN become fuzzy numbers. (2) Single hidden layer: Generally one or two hidden layers are more beneficial for the convergence property of the FBPN. (3) Number of neurons in the hidden layer: 1–2*(9 + m). The computing efficiency decreases rapidly if the scale of the FBPN (including the number of the hidden-layer nodes) increases. Nevertheless, a large number of the hidden-layer nodes are theoretically beneficial to the estimation accuracy. For these reasons, the optimal number of the hidden-layer nodes in the FBPN is chosen from the interval [1, 2*(9 + m)] in the proposed methodology. (4) Output: the (normalized) estimated cycle time of the example. (5) Network learning rule: Delta rule. (6) Transformation function: Sigmoid function,

f ðxÞ ¼

(7) (8) (9) (10)

1 : 1 þ ex

ð18Þ

Learning rate (g): 0.01–1.0. Batch learning. Number of epochs per replication: 25,000–75,000. Number of initial conditions/replications: 100. Because the performance of a FBPN is sensitive to the initial condition, the training or testing process will be repeated many times with different initial conditions that are randomly generated. Among the results, the best one is chosen for the subsequent analyses. In this respect, GA can be applied to enhance the efficiency.

The parameters used in the FBPN are defined: (1) (2) (3) (4) (5) (6) (7) (8) (9)

~xi : the input to the ith input node. ~ hij : the connection weight between the ith input node and the jth hidden node. w ~I h : the input to the jth hidden node. j ~ hhj : the threshold on the jth hidden node. ~hj : the output from the jth hidden node. ~ oj : the connection weight between the jth hidden node and the output node. w ~I o : the input to the output node. ~ho : the threshold on the output node. ~o: the network output (before defuzzification).

The procedure for determining the parameter values is now described. After pre-classification, a portion of the adopted examples in each category is fed as ‘‘training examples” into the FBPN to determine the parameter values for the category. Two phases are involved at the training stage. At first, in the forward phase,

430

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

inputs are multiplied with weights, summated, and transferred to the hidden layer. Then activated signals are outputted from the hidden layer as: ~ hj ¼

1

ð19Þ

;

h

1 þ e~nj

where ~ hhj ; nhj ¼ ~I hj ðÞ~ X ~I h ¼ ~ hij ðÞ~xðiÞ ; w j

ð20Þ ð21Þ

all i

~ hj ’s are also transferred to the output layer with the same procedure. Finally, the output of the FBPN is generated as: ~ o¼

1 ; 1 þ e~no

ð22Þ

where ~ ho ; no ¼ ~I o ðÞ~ X ~I o ¼ ~ o ðÞ~hj : w

ð23Þ ð24Þ

j

all j

The extension principle can be applied to derive the equivalent crisp equations from these fuzzy equations as follows: l~I hj ðI hj Þ ¼

I hj ¼

minðlw~ hij ðwhij Þ; l~xi ðxi ÞÞ:

sup P

wh xi all i ij

ð25Þ

l~nhj ðnhj Þ ¼ sup minðl~I hj ðI hj Þ; l~hh ðhhj ÞÞ: l~hj ðhj Þ ¼

sup hj

nh ¼1=ð1þe j Þ

l~I o ðI o Þ ¼ Io¼

ð26Þ

j

nhj ¼I hj hhj

sup P

minðl~nhj ðnhj ÞÞ:

ð27Þ

minðlw~ oj ðwoj Þ; l~hj ðhj ÞÞ:

ð28Þ

wo hj all j j

l~no ðno Þ ¼ sup minðl~I o ðI o Þ; l~ho ðho ÞÞ:

ð29Þ

no ¼I o ho

l~o ðoÞ ¼

o

sup o

o¼1=ð1þen Þ

minðl~no ðn ÞÞ:

ð30Þ

In this way, these fuzzy parameters can be given in any type of fuzzy numbers, e.g. triangular fuzzy numbers, trapezoidal fuzzy numbers, the bell-shaped fuzzy numbers, LR-type fuzzy numbers, etc. Especially, if all parameters are given in triangular fuzzy numbers, then according to the arithmetic for triangular fuzzy numbers these parameters can be approximated with the following equations: ~I h ¼ ðI h ; I h ; I h Þ ¼ j j1 j2 j3

X

minðwhij1 xðiÞ1 ; whij1 xðiÞ3 ; whij3 xðiÞ1 ; whij3 xðiÞ3 Þ;

all i

X all i

whij2 xðiÞ2 ;

X

! maxðwhij1 xðiÞ1 ; whij1 xðiÞ3 ; whij3 xðiÞ1 ; whij3 xðiÞ3 Þ

:

ð31Þ

all i

~ nhj ¼ ðnhj1 ; nhj2 ; nhj3 Þ ¼ ðI hj1  hhj3 ; I hj2  hhj2 ; I hj3  hhj1 Þ:

ð32Þ

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438



~ hj ¼ ðhj1 ; hj2 ; hj3 Þ ffi

1

1

1

 ð33Þ

h ; h ; h 1 þ enj1 1 þ enj2 1 þ enj3 ~o ¼ ðno1 ; no2 ; no3 Þ ¼ ðI o1  ho3 ; I o2  ho2 ; I o3  ho1 Þ; n

X

~I o ¼ ðI o ; I o ; I o Þ ¼ 1 2 3

~ o ¼ ðo1 ; o2 ; o3 Þ ffi

X

woj2 hj2 ;

all j

! maxðwoj1 hj1 ; woj1 hj3 ; woj3 hj1 ; woj3 hj3 Þ

all j



ð34Þ

minðwoj1 hj1 ; woj1 hj3 ; woj3 hj1 ; woj3 hj3 Þ;

all j

X

431

1 1 1 ; ; o no1 no2 1þe 1þe 1 þ en3

ð35Þ

 ð36Þ

To improve the practical applicability of the FBPN and to facilitate the comparisons with conventional techniques, the fuzzy-valued output o˜ is defuzzified according to Wrather and Yu’s formula: Z 1 dð~ oÞ ¼ Eðoa Þ da; ð37Þ 0

a

where E(o ) is the mean value of the elements in the a–cut of o˜. Then the output o is compared with the normalized actual cycle time a, for which the RMSE is calculated: sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P 2 all trained examples ðo  aÞ : ð38Þ RMSE ¼ number of trained examples Subsequently in the backward phase, the deviation between o and a is propagated backward, and the error terms of neurons in the output and hidden layers can be calculated, respectively, as: do ¼ oð1  oÞða  oÞ: ~dh ¼ ~ hj ðÞð1  ~ hj ÞðÞ~ woj do : j

ð39Þ ð40Þ

Based on them, adjustments that should be made to the connection weights and thresholds can be obtained as: hj : D~ woj ¼ gdo ~ h h ~ D~ w ¼ gd ðÞ~xi : ij o

ð41Þ ð42Þ

j

o

Dh ¼ gd : ~h ¼ g~ dhj : Dh j

ð43Þ ð44Þ

To accelerate convergence, a momentum can be added to the learning expressions. For example, ~ oj ðt  1ÞÞ: woj ðtÞ  w hj þ að~ D~ woj ¼ gdo ~

ð45Þ

The extension principle is again applied to derive the equivalent crisp equations for these fuzzy equations: l~dh ðdhj Þ ¼ j

sup dhj ¼hj ð1hj Þwoj do

lD~woj ðDwoj Þ ¼

sup Dwoj ¼gdo hj

lD~whij ðDwhij Þ ¼ lD~hh ðDhhj Þ ¼ j

sup Dwhij ¼gdhj xi

sup Dhhj ¼gdhj

minðl~hj ðhj Þ; lw~ oj ðwoj ÞÞ:

ð46Þ

minðl~hj ðhj ÞÞ:

ð47Þ

minðl~dh ðdhj Þ; l~xi ðxi ÞÞ:

ð48Þ

j

minðl~dh ðdhj ÞÞ: j

If all parameters are given in triangular fuzzy numbers, then these equations become

ð49Þ

432

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

~ dhj ¼ ðdhj1 ; dhj2 ; dhj3 Þ ffi ðminðminðhj1 ð1  hj3 Þwoj1 ; hj3 ð1  hj1 Þwoj1 Þdo ; maxðhj3 ð1  hj1 Þwoj3 ; hj1 ð1  hj3 Þwoj3 Þdo Þ; hj2 ð1  hj2 Þwoj2 do ; maxðminðhj1 ð1  hj3 Þwoj1 ; hj3 ð1  hj1 Þwoj1 Þdo ; maxðhj3 ð1  hj1 Þwoj3 ; hj1 ð1  hj3 Þwoj3 Þdo ÞÞ:

ð50Þ

D~ woj ¼ ðDwoj1 ; Dwoj2 ; Dwoj3 Þ D~ whij

¼ gðminðdo hj1 ; do hj3 Þ; do hj2 ; maxðdo hj1 ; do hj3 ÞÞ: ¼ ðDwhij1 ; Dwhij2 ; Dwhij3 Þ

ð51Þ

ffi gðminðdhj1 xi1 ; dhj1 xi3 ; dhj3 xi1 ; dhj3 xi3 Þ; dhj2 xi2 ; D~ hhj

¼

maxðdhj1 xi1 ; dhj1 xi3 ; dhj3 xi1 ; dhj3 xi3 ÞÞ:

ð52Þ

ðDhhj1 ; Dhhj2 ; Dhhj3 Þ

ð53Þ

¼

ðgdhj3 ; gdhj2 ; gdhj1 Þ:

Theoretically, network-learning stops when the RMSE falls below a pre-specified level, or the improvement in the RMSE becomes negligible with more epochs, or a large number of epochs have already been run. Then test examples are fed into the FBPN to evaluate the accuracy of the network that is also measured with the RMSE. However, the accumulation of fuzziness during the training process continuously increases the lower bound, the upper bound, and the spread of the fuzzy-valued output o˜ (and those of many other fuzzy parameters), and might prevent the RMSE (calculated with the defuzzified output o) from converging to its minimal value. Conversely, the centers of some fuzzy parameters are becoming smaller and smaller because of network learning. It is possible that a fuzzy parameter becomes invalid in the sense that the lower bound higher than the center. To deal with this problem, the lower and upper bounds of all fuzzy numbers in the FBPN will no longer be modified if the following index converges to a minimal value ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sP sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P 2 2 2 2 minððo  aÞ ; ðo  aÞ Þ 1 3 all examples all examples maxððo1  aÞ ; ðo3  aÞ Þ a þ ð1  aÞ ð54Þ number of examples number of examples 0 < a < 1: Finally, the FBPN can be applied to estimate the cycle time of a new lot. In addition, the fuzzy-valued output ~ o ¼ ðo1 ; o2 ; o3 Þ of the FBPN can be thought of as providing a weighted interval estimate for the actual cycle time a, and it becomes possible to further reduce the RMSE with such a weighted interval estimate to the following value: ffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X RMSE ¼ minððo1  aÞ2 ; ðo3  aÞ2 Þ=number of examples ð55Þ When a new lot is released into the fab, the eight parameters (except LSn) associated with the new lot are recorded and compared with those of each category center. Then the FBPN with the parameters of the nearest category center is applied to estimate the cycle time of the new lot. In this study, FCM is implemented using MATLAB V7, while a VB.NET program has been developed to construct the FBPN. 3. Production simulation for generating test data In practical situations, the history data of each lot is only partially available in the wafer fab. Further, some information of the previous lots such as Qn, BQn, and FQn is not easy to collect on the shop floor. Therefore, a simulation model is often built to simulate the manufacturing process of a real wafer fab (Vig & Dooley, 1991; Barman, 1998; Chang et al., 2001; Hung & Chang, 2001; Chang & Hsieh, 2003; Chen, 2003). Then, such information can be derived from the shop floor status collected from the simulation model (Chang et al., 2001). To generate test data, a simulation program coded using Microsoft Visual Basic 6.0 is constructed to simulate a wafer fabrication environment with the following assumptions:

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

433

Lots are uniformly released into the wafer fab. The distributions of the interarrival times of machine downs are exponential. The distribution of the time required to repair a machine is uniform. The percentages of lots with different product types in the fab are random. Namely, this study investigates the dynamic product-mix case. The percentages of lots with different priorities released into the fab are loosely controlled. The priority of a lot cannot be changed during fabrication. Lots are sequenced on each machine first by their priorities, then by the first-in-first-out (FIFO) policy. Such a sequencing policy is a common practice in many wafer fabs. A lot has equal chances to be processed on each alternative machine/head available at a step. A lot cannot proceed to the next step until the fabrication on its every wafer has been finished. No preemption is allowed.

The basic configuration of the simulated wafer fab is simplified from a real-world wafer fab which is located in the Science Park of Hsin-Chu, Taiwan, ROC. Assumptions (1)–(3), and (7)–(9) are commonly adopted in related researches (e.g. Chang et al., 2001; Chang & Hsieh, 2003; Chen, 2003; Chang et al., 2005), while assumptions (5) and (6) are made to simplify the situation. There are four products (labeled as A–D) in the simulated wafer fab. The simulated wafer fab has a monthly capacity of 20,000 pieces of wafers and is expected to be fully utilized (utilization = 100%). To simulate the dynamic product-mix case, the following procedure is established: (1) (2) (3) (4) (5) (6) (7) (8)

Consider the next week. Generate a random number between [0, 1] for each product type. Add up all these random numbers. Divide each random number by the sum. The result represents the percentage of the corresponding product type that will be released into the fab during that week. Convert each percentage into a number of wafer lots with sizes uniformly distributed from 21 to 25. Generate a random sequence of releasing the wafer lots of all product types. Arrange the release time of each wafer lot. Stop if the simulation horizon has been exceeded. Otherwise, return to step (1).

Fig. 5 illustrates the dynamic product mix in the simulated wafer fab. Lots are released into the wafer fab one by one every 0.85 h. Three types of priorities (normal, hot, and super hot) are randomly assigned to lots. The percentages of lots with these priorities released into the fab are restricted to be approximately 60%, 30%, and 10%, respectively. Each product has 150–200 steps and 6–9 reentrances to the most bottleneck machine. The singular production characteristic ‘‘reentry” of the semiconductor industry is clearly reflected in the example. It also shows the difficulty for the production planning and scheduling people to provide an accurate duedate for the product with such a complicated routing. Totally 102 machines (including alternative machines) are provided to process single-wafer or batch operations in the fab. Thirty replications of the simulation are successively run. The time required for each simulation replication is about 15 min on a PC with 256MB RAM and AthlonTM 64 Processor 3000+ CPU. A horizon of 24 months is simulated. The maximal cycle time is less than 3 months. Therefore, 4 months and an initial WIP status (obtained from a pilot simulation run) seemed to be sufficient to drive the simulation into a steady state. The statistical data were collected starting at the end of the fourth month. For each replication, data of 30 lots are collected and classified by their product types and priorities. Totally, data of 900 lots can be collected as training and testing examples. Among them, 2/3 (600 lots, including all product types and priorities) are used to train the network, and the other 1/3 (300 lots) are reserved for testing. A trace report was generated every simulation run for verifying the simulation model. The simulated average cycle times have also been compared with the actual values to validate the simulation model. Subsequently, in another simulation experiment a fixed product mix was assumed, so as to collect some data to compare these two different assumptions. Take the data of product type A, normal priority, full size (25 wafers per lot) as an example, the time series plot of 35 simulated successive cycle times is shown in Fig. 6. As we can observe here, the pattern of the cycle

434

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438 40%

% in product mix

35% 30%

A% B% C% D%

25% 20% 15% 10% 2700

2900

3100

3300

3500

3700

time (hours)

Fig. 5. The dynamic product mix.

2100 cycle time (hours)

2000 1900 1800

fixed

1700

dynamic

1600 1500 1400 1

3

5

7

9 11 13 15 17 19 21 23 25 27 29 31 33 35 lot #

Fig. 6. The cycle time fluctuation.

Table 1 Comparison between the two product-mix assumptions (product A, normal priority, full size)

Fixed product mix Dynamic product mix

Cycle time average (h)

Cycle time standard deviation (h)

1687 1856

86 97

time is not stable and very non-stationary. The traditional approach by human decision is very inaccurate and very prone to failure when the shop status is totally different even for the same product. Further, the results under the fixed product-mix assumption were compared with (also in Fig. 6). The results are summarized in Table 1. The cycle time average and standard deviation under the dynamic product-mix assumption are both greater than those under the fixed product-mix assumption in this case.

4. Experimental results and discussion To evaluate the effectiveness of the proposed methodology and to make comparison with six existing approaches – MFLC, BPN, FBPN, CBR, EFR, and SOM-WM, all the seven methods were applied to twelve test cases containing the data of lots with two product types (A and B) and all priorities. Analyzing only the data of two product types was considered to be sufficient because all product types in a dynamic product-mix environment were equally treated. The minimal RMSEs achieved by applying the seven approaches to differ-

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

435

Table 2 Comparisons of the RMSEs of various approaches RMSE (h)

MFLC

BPN

FBPN

CBR

EFR

SOM-WM

The proposed methodology

A (normal) A (hot) A (super hot) B (normal) B (hot) B (super hot)

68 42 33 95 66 23

65 41 32 90 65 21

63 36 30 90 65 20

63 36 32 93 66 21

59 27 24 65 38 17

59 26 24 62 33 17

35 (49%) 18 (57%) 9 (73%) 26 (73%) 32 (52%) 8 (65%)

(4%) (4%) (5%) (5%) (2%) (7%)

(7%) (14%) (9%) (5%) (2%) (13%)

(7%) (14%) (5%) (2%) (0%) (7%)

(13%) (36%) (27%) (32%) (43%) (27%)

(13%) (39%) (27%) (35%) (50%) (27%)

ent cases were recorded and compared in Table 2. Note that the minimal RMSEs have been converted back to the un-normalized values to be more meaningful practically. In the BPN or FBPN, there was one hidden layer with 9  18 nodes, depending on the results of a preliminary analysis for establishing the best configuration. In the proposed methodology, firstly wafer lots were classified with FCM. The optimal number of categories was determined with the S test (Xie & Beni, 1991). Subsequently, examples of different categories were then learned with different FBPNs but with the same topology. The convergence condition was established as either the improvement in the RMSE becomes less than 0.001 with one more epoch, or 75,000 epochs have already been run. To consider the future release plan, the 9-bucket approach was applied, in which each bucket contained the discounted workloads within a future time interval of 100 h. An example of the 9 buckets of three successive lots in the experiment is shown in Fig. 7. MFLC was adopted as the comparison basis, and the percentage of improvement in the minimal RMSE by applying another approach is enclosed in parentheses following the performance measure. The optimal value of parameter k in the CBR approach was equal to the value that minimized the RMSE (Chang et al., 2005). According to experimental results, the following points are made:

0.3 0.25 lot #1

0.2

lot #2

0.15

lot #3

0.1 0.05 0

9

8

7

6

5 4 Bucket #

3

2

1

Discounted future workloa

(1) Experimental results revealed that the cycle times of lots with higher priorities were easier to estimate. (2) From the effectiveness viewpoint, the estimation accuracy (measured with the RMSE) of the proposed methodology was significantly better than those of the other approaches in all cases by achieving a 49–73% (and an average of 62%) reduction in the RMSE over the comparison basis – MFLC. The average advantages over the three approaches without example classification – BPN, FBPN, and CBR were 57%, 54%, and 56%, respectively. (3) When the proposed methodology was compared with the two approaches that also differentiated examples (EFR and SOM-WM), the superiority in estimation accuracy was also evident (32% and 30% on average, respectively), which might be due to the incorporation of the future release plan with the 9-bucket approach. Take the case of product type A, normal priority as an example (see Fig. 8). The proposed methodology seemed to provide a very good fit for the experimental data under the dynamic product-mix wafer fabrication environment. (4) The performances of EFR and SOM-WM were very close. In fact, these two approaches are quite similar in nature (Chen, 2007a).

0

Fig. 7. The 9 buckets of three successive lots (product type A, normal priority) in the experiment.

436

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

cycle time (hours)

2100 2000 1900 actual value estimate

1800 1700 1600 1500 1

3

5

7

9 11 13 15 17 19 21 23 25 27 29 31 33 35 lot #

RMSE (hours)

Fig. 8. The good fit provided by the proposed methodology (product type A, normal priority).

80 70 60 50 40 30 20 10 0

2

0

6

4

8

10

number of bucket Fig. 9. The effect of the number of buckets (product type A, normal priority).

(5) As the lot priority increases, the superiority of the proposed methodology becomes more evident. The reason might be that lots with high priorities are often few, and therefore 9 buckets would be sufficient to model their workloads precisely. (6) The effects of various numbers of buckets (m) on the estimation accuracy were also investigated. The results are shown in Fig. 9. Though a large value of m is beneficial to the estimation accuracy, the difference gradually diminishes as m increases. (7) The proposed methodology has two characteristics – classifying wafer lots and considering the future release plan. The effects of these two characteristics are analyzed in detail in Table 3, in which three approaches (FBPN, FCM-FBPN without buckets, and the proposed approach) are compared. The aver-

Table 3 The effects of the two characteristics RMSE (h)

FBPN

FCM-FBPN without buckets (I)

The proposed methodology (II)

Effects of classifying examples (I)

Effects of considering the future release plan (II–I)

A (normal) A (hot) A (super hot) B (normal) B (hot) B (super hot)

63 36 30 90 65 20

54 24 23 61 35 15

35 (44%) 18 (50%) 9 (70%) 26 (71%) 32 (51%) 8 (60%)

14% 33% 23% 32% 46% 25%

30% 17% 47% 39% 5% 35%

(14%) (33%) (23%) (32%) (46%) (25%)

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

437

age effect of example classification is 29%, while the effect of considering the future release plan ranges from 5%  47%, which is much greater than that obtained in Chen (200a) that used only 3 buckets and assumed a fixed product mix. (8) In the efficiency respect, theoretically the more the number of buckets is, the more the number of nodes in both the input and hidden layers become, and the training time might be significantly lengthened. However, in the experiment the difference was not that much. Take the case of product type A, normal priority as an example. The training time with only 1 bucket was 7 min and 47 s, while that with up to 9 buckets was only inflated to 8 min and 16 s. Therefore, using many buckets did not worsen the efficiency of the proposed methodology.

5. Conclusions A fuzzy-neural and multiple-bucket approach is proposed in this study for lot cycle time estimation in a wafer fab with dynamic product mix, which was seldom thoroughly investigated in the past studies. The proposed methodology is composed of two parts. In the first part, the multiple-bucket approach is applied to consider the future release plan of the wafer fab, which is very important to the modeling of the dynamic product mix in the wafer fab. In the second part, the FCM-FBPN approach is applied to estimate the cycle time of every lot in the wafer fab. The buckets obtained in the first part become additional inputs to the FBPN at the second part. In this way, the fluctuation in the product mix since the release of a wafer lot can be considered in estimating the cycle time of the wafer fab. For demonstrating the applicability of the proposed methodology, production simulation is also applied in this study to generate some test data. According to experimental results, (1) From the effectiveness viewpoint, the estimation accuracy of the proposed methodology was significantly better than those of many existing approaches. (2) A large number of buckets are beneficial to the estimation accuracy. (3) On the other hand, using many buckets does not worsen the efficiency of the proposed methodology. However, to further evaluate the effectiveness and efficiency of the proposed methodology, it has to be applied to fab models of different scales, especially a full-scale actual wafer fab. Besides, there are many other ways of incorporating the future release plan of a wafer fab. These constitute some directions for future research. References Barman, S. (1998). The impact of priority rule combinations on lateness and tardiness. IIE Transactions, 30, 495–504. Chang, P.-C., & Hsieh, J.-C. (2003). A neural networks approach for due-date assignment in a wafer fabrication factory. International Journal of Industrial Engineering, 10(1), 55–61. Chang, P.-C., Hsieh, J.-C., & Liao, T.W. (2001). A case-based reasoning approach for due date assignment in a wafer fabrication factory. Proceedings of the International Conference on Case-Based Reasoning (ICCBR 2001), Vancouver, British Columbia, Canada. Chang, P.-C., Hsieh, J.-C., & Liao, T. W. (2005). Evolving fuzzy rules for due-date assignment problem in semiconductor manufacturing factory. Journal of Intelligent Manufacturing, 16, 549–557. Chen, T. (2003). A fuzzy back propagation network for output time prediction in a wafer fab. Applied Soft Computing, 2(3), 211–222. Chen, T. (2006). A hybrid SOM-BPN approach to lot output time prediction in a wafer fab. Neural Processing Letters, 24(3), 271–288. Chen, T. (2007a). An intelligent hybrid system for wafer lot output time prediction. Advanced Engineering Informatics, 21, 55–65. Chen, T. (2007b). Predicting wafer lot output time with a hybrid FCM-FBPN approach. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics, 37(4), 784–793. Chen, T. (2008). An intelligent mechanism for lot output time prediction and achievability evaluation in a wafer fab. Computers and Industrial Engineering, 54, 77–94. Chang, P.-C., & Liao, T. W. (2006). Combining SOM and fuzzy rule base for flow time prediction in semiconductor manufacturing factory. Applied Soft Computing, 6, 198–206. Hung, Y.-F. & Chang, C.-B. (2001). Dispatching rules using flow time predictions for semiconductor wafer fabrications. Proceedings of the 5thAnnual International Conference on Industrial Engineering Theory, Applications and Practice, Taiwan, 2001.

438

T. Chen / Computers & Industrial Engineering 55 (2008) 423–438

Vig, M. M., & Dooley, K. J. (1991). Dynamic rules for due-date assignment. International Journal of Production Research, 29(7), 1361–1377. Wang, L.-X., & Mendel, J. M. (1992). Generating fuzzy rules by learning from examples. IEEE Transactions on Systems, Man, and Cybernetics, 22(6), 1414–1427. Xie, X. L., & Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions of Pattern Analysis and Machine Intelligence, 13, 841–847.