Application of neural networks and genetic algorithms to the screening for high quality chips

Application of neural networks and genetic algorithms to the screening for high quality chips

Applied Soft Computing 9 (2009) 824–832 Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/a...

1MB Sizes 2 Downloads 22 Views

Applied Soft Computing 9 (2009) 824–832

Contents lists available at ScienceDirect

Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc

Application of neural networks and genetic algorithms to the screening for high quality chips Chenn-Jung Huang a,*, You-Jia Chen a, Chi-Feng Wu b, Yi-An Huang b a b

Department of Computer Science and Information Engineering, National Dong Hwa University, No. 123 Hua-Hsi Rd., Hualien 970, Taiwan Testing Factory, Philips Semiconductor Kaohsiung, Taiwan

A R T I C L E I N F O

A B S T R A C T

Article history: Received 15 April 2005 Received in revised form 24 September 2008 Accepted 13 October 2008 Available online 5 November 2008

During electrical testing, each die on a wafer must be tested to determine whether it functions as originally designed. For a clustered defect on a wafer, for example scratches, stains, or localized failure patterns, defective dies in the flawed area may not all be detected during the electrical testing stage. To prevent the defective dies from proceeding to the final assembly, the testing factory must assign some workers to identify patterns in the layout of defective dies for labeling other potential defects. Although a previously developed defect detection program enables full automation of the testing process in a testing factory, numerous defective dies in recognized clusters are not picked out, or in some clusters are even not captured in certain circumstances. This work thus proposes two automatic wafer-scale defect cluster identifiers, which utilize neural networks and genetic algorithms for detecting the defect clusters, and compares them with that presented in our earlier work. The experimental results confirm that both of the proposed algorithms are more effective in identifying defect clusters than the defect detection program presently used by the testing factory. ß 2008 Elsevier B.V. All rights reserved.

Keywords: Defect detection Median filter Single-linkage clustering Cellular neural networks Genetic algorithm Multilayer perceptron

1. Introduction Integrated circuits manufacturing process begins with the production of a wafer—usually a thin, round silicon slice of semiconductor. Fig. 1 depicts hundreds of dies formed on a single wafer during the fabrication process. Some defect clusters, including scratches, stains, or localized failure patterns created by the fabrication process, occasionally appear on a wafer. Figs. 2–4 reveal three examples of visible defects normally caused by equipment malfunctions or human mistakes, for instance, carelessness. The invisible defects, which account for 85% of the total cases, result from two types of faults introduced during wafer fabrication process. The first type is the leakage current caused by isolation problem or parametric shift, for example lower threshold voltage and thinner oxide layer. The second type is shorter effective channel length caused by lithograph, implantation or etching problems. The invisible defects frequently form localized patterns with some pass dies in/close to the bad area on the wafer maps. Although an automatic, computer-driven electrical test system checks whether each die on the wafer is functional, the

* Corresponding author. Tel.: +886 38227106x1520; fax: +886 38237408. E-mail address: [email protected] (C.-J. Huang). 1568-4946/$ – see front matter ß 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2008.10.002

electrical tests fail to detect all of the faulty dies in the flawed areas and the chips might not operate normally after shipping to the customer. When superior quality IC products are required, for example in the automotive industry, all the dies in the defect cluster that pass electrical tests should also be marked as defective dies during the probe testing stage to ensure high quality chips, otherwise the testing factory may receive customer complaints regarding poor quality IC products. The prober company was asked to develop the tool to detect the defect clusters and has failed to do so. Consequently, the testing factory assigns some yield analysis engineers to visually check wafers and hand mark the defective dies in, or close to, these flawed regions. Although this manual checking prevents numerous defective dies from continuing in assembly, it does not detect localized failure patterns resulting from the fabrication processes because they are invisible to the naked eye. To tackle the defect identification and marking problem as mentioned above, this work attempts to apply neural network techniques and genetic algorithms (GAs) [1,2] for detecting defect clusters on a wafer. The motivation of using neural networks and genetic algorithms in this work is that they have been successfully applied in numerous areas, for example image processing, pattern recognition, machine learning, and so on. Both techniques have been proven highly effective for object extraction in the literature

C.-J. Huang et al. / Applied Soft Computing 9 (2009) 824–832

825

Fig. 1. Photo of a wafer sample. Fig. 4. A scratch on insulator.

although in certain circumstances the use of both algorithms may introduce an additional yield loss. Notably, the proposed algorithms facilitate the application of automated water testing during the wafer-probing stage. Such testing would be performed early in the manufacturing process and avoid the cost of assembling and testing defective dies. The remainder of this paper is organized as follows. Section 2 briefly surveys related work. A defect cluster detection algorithm using image processing techniques [14] is briefly reviewed in Section 3. Section 4 describes a scheme which utilizes a selfsupervised multilayer perceptron to replace the core of the algorithm in [14], while a cellular neural network (CNN) and genetic algorithms approach is proposed in Section 5. Furthermore, Section 6 presents the simulation results, which compare the two proposed algorithms with our earlier work [14]. Finally, conclusions are presented in Section 7. Fig. 2. A scratch on the surface of a wafer.

2. Related work [1–10]. Moreover, numerous solutions exist on VLSI chips that allow neural networks and genetic algorithms to be hardwarecomputed, and high-speed low cost chips of neural networks and genetic algorithms recently have been introduced, enabling the implementation of neural networks and genetic algorithms using hardware [11–13]. The experimental results in this work confirm that the proposed algorithms effectively detect defect clusters,

Fig. 3. A scratch under passivation.

Several algorithms have been proposed for dealing with defect identification and classification problems to improve wafer yield [3,4,14–26]. Among them, back-propagation neural network, adaptive resonance theory network 1 (ART1), hyperellipsoid clustering network, learning vector quantization neural networks, and Hough transformation were respectively employed by Zeng et al. [15]. Chen and Liu [3], Kameyama et al. [4], Chang et al. [16], and White et al. [17] for defect pattern classification. Baluja and Maxion [18] classified five types of plasma-etch faults using expectation in a neural network-based system to filter noise and irrelevant features, while Taubenlatt et al. [19] and Wang [20] attempted to employ spatial filtering to separate the defect features from repeating geometric patterns and the irrelevant features. Nikoonahad et al. [21] extracted examples of the so-called point-spread-function from the 2D signal for detecting small anomalies. Maruo et al. [22] employed Hough transform for recognizing the defects in the shape of the straight lines or circles, while Vacca et al. [23] designed an advanced line-measurement algorithm (ALM) for detecting errors in photomask critical dimension (CD) at the 75-nm level. Notably, none of the abovementioned defect classification algorithms can effectively detect defect clusters in all kinds of shapes and automatically mark potentially bad dies during wafer probing, as required in wafer manufacturing. Although Huang et al. [14] employed a median filter and clustering approach for tackling the defect identification and marking problem successfully, it has been shown that some

826

C.-J. Huang et al. / Applied Soft Computing 9 (2009) 824–832

processing, the tool presented here converts the binary codes stored in the wafer map into binary values for each matrix entry, where 0 represents a good die and 1 is a defective die. This simplification does not differentiate the defective dies in the same way as the probe instrument. A wafer map can contain different kinds of defects, and binary codes are assigned to defective dies for these different defects. Although the tool presented here uses this simplified data structure, its output wafer map file still follows the SECS standard. 3.2. Irrelevant feature removal

Fig. 5. Five process stages of the proposed defect cluster detection algorithm.

defective dies in recognized clusters or even some clusters are not captured in certain circumstances. 3. Defect cluster detection algorithm As shown in Fig. 5, five process stages comprise the proposed defect cluster detection algorithm, which is summarized below: (1) Read in the wafer map file data format and convert it into a binary matrix. (2) Separate the irrelevant defective dies from wafer defect clusters. (3) Perform single-linkage clustering on the binary matrix. (4) Build linked lists for the defect clusters. (5) Proceed through the linked lists and mark all the eight-adjacent neighbors as defective dies to compensate for the unexpected erosion introduced during the irrelevant feature removal process stage. 3.1. Wafer map conversion The basic data format used by the prober is the Semiconductor Equipment Communication Standard (SECS) format designed by the Semiconductor Equipment and Materials Institute (SEMI). The wafer map of the prober includes the SEMI defined formatting information (for example lists, single and multibyte integers, and so on) and the map and header data. The header data is used to determine the wafer orientation during probing. This data comprises a list of individual reference points used for comparing the physical and logical wafers (the wafer map data file), the die size, the total number of rows and columns, and the number of dies requiring processing on the wafer. The data section of the wafer map records the die count in the row, the direction of the x-axis travel between dies, and the actual binary codes of the dies. The tool presented here reads in the wafer map file reported by the prober and transforms it into a bounded matrix with m by n entries, where m denotes the total number of rows on the wafer and n represents the total number of columns. To simplify the

The automation of irrelevant feature removal is essential to maintain a high yield in semiconductor manufacturing, especially when the recognition of the invisible defects, which account for 85% of the total cases, is urgently demanded during the electrical testing stage. Our earlier work [14] examined the m by n binary matrix obtained from the wafer map conversion stage. Regarding the clustered defects, the isolated defective dies on these samples appeared to be salt-and-pepper type noise. The median filter [27] was employed in [14] to separate the defect clusters of the wafer from its isolated defective dies owing to the ability of the wafer to remove salt-and-pepper type noise. An even simpler median filter design was adopted because each matrix entry contains only a 1 (defective) or a 0 (defective-free). This design merely counted the total number of 1 s that appear in the 3  3 neighborhood. The pixel value was then set to 1 if the number of 1 s is five or greater. After applying the median filter, all 1 s remaining in the matrix were treated as defective dies in the defect cluster. The major problem of the median filter approach adopted in [14] is that some defective dies in recognized clusters or even some clusters are not captured in certain circumstances. This work thus proposes two novel approaches to tackle the abovementioned problem. Different from the median filter technique, the two proposed algorithms attempt to extract the defect clusters, including scratches, stains, localized failure patterns, especially invisible defects, instead of filtering out the isolated defective dies. The first approach presented in this work is the selfsupervised multilayer perceptrons [1,28–32] that possess the self-organizing capability to effectively extract the defect clusters from the m by n binary matrix obtained from the wafer map conversion stage. The second one adopts an unsupervised neural network architecture, so-called cellular neural networks [2], along with parameter setting by using genetic algorithms, to extract the defect clusters from the noisy m by n binary wafer map. Sections 4 and 5 will give a detailed description of how the two proposed algorithms are applied to recognize the defect clusters in this work. 3.3. Single-linkage clustering Following the tool removes all isolated defective dies, and a region-based segmentation step is used for grouping various regions into an image with similar features. This work employed the single-linkage method for identifying defective dies of different defect clusters in our earlier work [14]. This work implemented this method by defining the distance between two defect clusters as the shortest distance between two defective dies, where each defect cluster contains one defective die. Suppose that the wafer contains N defective dies at the initial state and each die forms an individual cluster. This work merges any two clusters together if the distance between them is below two dies. The merging process proceeds sequentially until the distance between the closest neighbors of different clusters exceeds one die.

C.-J. Huang et al. / Applied Soft Computing 9 (2009) 824–832

827

3.4. Building linked lists After applying the single-linkage method, the algorithm constructs linked lists for representing members of different defect clusters on the wafer. This list contains three fields:  x-die, which denotes the matrix row index of the defective die;  y-die, representing the matrix column index;  next-die, which indicates the next element in the linked list. The algorithm creates the list by linking all of the dies in a cluster. Each linked list represents one cluster, and length of the list equals the number of defective dies in the cluster. This work selects a defective die from the linked list, and employs the first and the second fields of the die—x- and y-die to calculate the distances between two clusters where the two closest dies in each cluster are selected. Additionally, this work can link any two clusters by setting the next-die field of the last defective die of one list to point to the first defective die of the other list. This data provides test engineers with information they require to either back trace or identify the possible causes of the defect clusters. Notably, this work needs to scrutinize the linked lists to discard extremely small clusters before proceeding to the next processing stage, since the individual defective dies that are spatially clustered together are generally erroneously identified as being miniature clusters. 3.5. Defective die marking Some defective dies in the defect clusters can still pass the electrical test (wafer probe). Accordingly, this work proposes examining all the elements in the linked lists and marking their eight-adjacent neighbors as defective dies even if these neighbors pass the electrical test [33]. The last step marks an isolated interior die (a good die surrounded by eight defective dies) as a defective die. The tool then generates an output file for use in the next processing stage. 4. Defect identification using self-supervised multilayer perceptrons The multilayer perceptrons have successfully been applied in pattern recognition and image processing in the literature [1,3,4,28–32]. Thus, this work attempts to replace the core of the algorithm presented in Section 2, which is the median filter, with a suitable architecture of multilayer perceptrons, while maintaining other process stages, including clustering, linking, and marking unchanged. 4.1. Self-supervised multilayer perceptrons The multilayer perceptron has adjustable parameters that are updated via a supervised learning rule. Previous studies have proposed several object or feature extraction algorithms using neural networks. These kinds of neural network try to shape the input–output mappings based on particular training data sets, and thus this approach is only valid for defects with a similar nature. However, it is infeasible to provide training samples here because the shapes and the sizes of defect clusters are generally highly volatile. This work thus integrates a self-supervised multilayer perceptron with clustering and linking techniques described in Section 3 for recognizing the defect clusters in diversified shapes and sizes [1,3,4,28–32]. The distinguishing feature of the selfsupervised multilayer perceptron is that it does not require any a priori target output value for supervised learning.

Fig. 6. A self-supervised multilayer perceptron with one hidden layer.

4.1.1. Network model Fig. 6 illustrates the architectural graph of a modified version of the traditional multilayer perceptron with a single hidden layer. Rather than comparing the actual and desired outputs in each training cycle, the input of the self-supervised multilayer perceptron is fed with the output generated at the end of the previous training cycle. The feedback connection between the output layer and the input layer aims to perform some sort of selfsupervised learning for self-organizing and identifying the structure in the input data without a priori teaching. Four self-supervised multilayer perceptrons operate simultaneously, each associating three links with each node in the inputto-hidden and hidden-to-output connections. Fig. 7 illustrates four connection modes used in the four self-supervised multilayer perceptrons, respectively. The inclusion of three neighboring elements in a row within a 3  3 window ensures that all the elongated images in that direction can be recognized, and four combinations, as illustrated in Fig. 7 can then grasp long and narrow images in all directions. The output of node k in the output layer can be expressed as 0 !1 X ðOÞ X ðHÞ ðOÞ @ Ok ¼ f ðnk Þ ¼ f (1) w jk  f wi j  xi A; j

i

ðHÞ

where wi j represents the weight associated with the link connecting node i in the input layer and node j in the hidden layer, and xi is the input signal handed over from node i in the input ðOÞ layer, w jk is the weight associated with the link connecting node j in the hidden layer and node k in the output layer, and f() is a differentiable nonlinear activation function, for example a logistic function f ðnÞ ¼

1 : 1 þ eðn1:5Þ

(2)

Because the initial weights are all set to 1 to enable the selfsupervised multilayer perceptron to filter out the isolated dies, a bias needs to be added to Eq. (2) to indicate that there are three links connected to a node. 4.2. Backpropagation learning rule The error measure in the self-supervised multilayer perceptron is defined as Py minðOk ; 1  Ok Þ E ¼ k¼1 ; (3)

y

C.-J. Huang et al. / Applied Soft Computing 9 (2009) 824–832

828

Fig. 7. Distribution of links from the previous layer to the next layer.

where y represents the number of the nodes in the output layer, and Ok denotes the output of the node k in the output layer. The form of min (Ok, 1  Ok) in Eq. (3) calculates the distance from Ok to 0 or 1, whichever is closer. As Ok switches from 0 to 1 or 1 to 0, the value of min (Ok, 1  Ok) approaches 0.5. On the other hand, the value of min (Ok, 1  Ok) begins to reduce when Ok approaches its final value. Then based on the gradient-descent method [34], the correction ðOÞ of the synaptic weight w jk in the hidden-to-output connections is updated by

DwðOÞ ¼ jk

8 h  Ok ð1  Ok ÞH j > <

if 0  Ok  0:5;

y

> : h  Ok ð1  Ok ÞH j

(4)

if 0:5 < Ok  1;

y

where h denotes the learning-rate parameter of the backpropagation algorithm, and Hj represents the output of node j in the hidden layer. For the weight update in the input-to-hidden connections, the chain rule is used together with the gradient-descent method, as follows:

D

ðHÞ wi j

¼

8 P > h k ðOk ð1  Ok ÞwðOÞ ÞH j ð1  H j Þxi > jk > <

y

P > h k ðOk ð1  Ok ÞwðOÞ ÞH j ð1  H j Þxi > jk > :

y

if 0  Ok  0:5; if 0:5 < Ok  1: (5)

The training cycles continue until some stopping criterion is met, for example the error measure being below a predefined threshold, or the computation epochs reaching some maximum limit. This work still must assign a binary code to the output of the self-supervised multilayer perceptron upon completion of the training process. The mapping of the output of the kth node in the

output layer to a binary code, bk, is defined as

bk ¼



0 1

if 0  Ok  0:5; if 0:5 < Ok  1:

(6)

The outputs of the four self-supervised multilayer perceptrons are merged into a binary output matrix. The merging task becomes straightforward if only 1 is permitted to denote the bad die and 0 denotes the good die. All we have to do is manipulating a simple binary-OR operation over the four binary output matrices. This study observes that the network output has stabilized following several training cycles, though the error measure might not be sufficiently close to 0. Restated, it appears feasible to stop the training process whenever the output of each node in the output layer approaches its final binary code. Accordingly, this work maps the actual output to a binary code at the end of each epoch, and compares the mapping with the output generated during the previous epoch. If they are identical for several epochs, the network can be presumed to converge and halt the training cycles. The experimental results confirm that this approach significantly accelerates the detection process. 5. Defect identification using cellular neural networks and genetic algorithms A cellular neural network is an unsupervised neural network that is composed of a massive aggregate of analog circuit components called cells. Setting up a CNN needs a proper selection of circuit parameters of cells. The dynamics of a CNN is determined by the set of circuit parameters, which are collectively called the cloning template. Since The CNNs have attracted the attention of the scientific community and proven useful for solving a variety of tasks, including image processing and pattern recognition problems, we thus try to replace the core of the algorithm presented in Section 3, with a CNN, and the genetic algorithm is

C.-J. Huang et al. / Applied Soft Computing 9 (2009) 824–832

829

employed to settle the cloning template for the CNN owing to its success in the applications of optimization and machine learning problems [2,6,35–37]. 5.1. Cellular neural network Consider a cellular neural network with M  N cells arranged in a rectangular array as shown in Fig. 8, where the state xij of each cell Cij, i = 1, . . ., M, j = 1, . . ., N, satisfies the following differential equation [5]: C

X dxi j ðtÞ xi j ðtÞ ¼ þ Ai j;kl  ykl ðtÞ þ Ii j ; dt Rx C 2 N ði; jÞ kl

1  i  M; 1

r

 jN

(7)

where Rx, C and Iij represent the resistance, capacitance and bias of the cell Cij, respectively, ykl denotes the output of Ckl, Aij,kl is the conductance of the link between the cells Cij and Ckl, and Nr(i,j) is defined as the r-neighborhood within radius r of Cij: Nr ði; jÞ ¼ fC kl jmaxðjk  ij; jl  jjÞ  r;

1  k  M; 1  l

 Ng:

(8)

Note that the output yij of the cell Cij is a piecewise-linear function as shown in Fig. 9. It can be expressed as yi j ðtÞ ¼

jxi j ðtÞ þ 1j  jxi j ðtÞ  1j ; 2

(9)

with a constraint condition jxij(0)j  1. In order to apply the CNN to the problem of separating isolated defective dies from the defect clusters, Eq. (7) is further approximated by a difference equation as follows: 0 1 X h @ xi j ðnÞ  xi j ðn þ 1Þ ¼ xi j ðnÞ þ þ Ai j;kl ykl ðnÞ þ Ii j A C Rx C 2 N ði; jÞ kl

for 1  i  M; 1  j  N (10)

for 1  i  M; 1  j  N;

where Aij,kl is assumed to be position invariant, and h a unit time step, (*) denotes a convolution operator, and T represents a cloning template 2 3 T r;r    T r;r 6 .. 7: T ¼ 4 ... (11) } . 5 

optimum solution is evolved from generation to generation in GAs without stringent mathematical formulation as the gradient-type optimization techniques. We therefore attempt to use the GAs to assign proper values to the cloning template T reiteratively until the output of the network is stabilized.

r

xi j ðnÞ Tyi j ðnÞ þ Ii j ¼ xi j ðnÞ þ  þ C  Rx C

T r;r

Fig. 8. A two-dimensional 4  4 cellular neural network.

5.2.1. Basics of genetic algorithms A GA begins its operation by randomly creating its initial population, and then starts its search for the optimum from the population. Each point in a parameter space is encoded into a binary string called chromosome, and each point is associated with

T r;r

Note that the convolution operator (*) is defined as X T ki;l j  xkl Txi j ¼

(12)

C kl 2 N r ði; jÞ

The choice of cloning template T determines the performance of a CNN, and should be adjusted properly for each specific operation it is applied to. As the GAs are successfully used in the parameter identification in complex models such as neural networks [6], and can be implemented on parallel-processing machines for massively speeding up their operations, we tend to use the GA to automatically decide the contents of the cloning template T. 5.2. Determination of cloning template with genetic algorithms Genetic algorithms have been shown to be well suited to hard optimization tasks, such as NP-complete problems. The obtained

Fig. 9. The characteristic of the nonlinear input–output mapping.

C.-J. Huang et al. / Applied Soft Computing 9 (2009) 824–832

830

a fitness value that is usually equal to the objective function evaluated at the point. A GA normally consists of three important operators, selection, crossover, and mutation. The selection operator determines which parent chromosomes participate in producing offspring for the next generation. Members with higher fitness values are more likely to be selected as the parents of a solution in the next generation. The crossover operator tries to combines the best features from different design solutions. When simple crossover of chromosomes with N genes is considered, there can be anything from 1 to N  1 point crossover, with 1 point certainly being common, where a single locus is chosen at random and all bits after the point are swapped. The last commonly used operator is mutation, which is used to produce variations in the population. It can be performed either during selection or crossover. 5.2.2. Selection of cloning template Similar to the approach taken in Section 4, we operate four CNNs concurrently to recognize long and narrow images in all directions. The four corresponding cloning templates, which include of three neighboring elements in a row within a 3  3 window, are initialized as follows: 2 3 2 3 0 1:0 0 0 0 0 T 1 ¼ 4 0 1:0 0 5; T 2 ¼ 4 1:0 1:0 1:0 5; 0 0 3 2 0 1:0 0 3 2 0 (13) 1:0 0 0 0 0 1:0 4 5 4 5 T3 ¼ 0 1:0 0 ; T 1 ¼ 0 1:0 0 : 0 0 1:0 1:0 0 0

binary codes to the network output is identical to the way we deal with in Eq. (6). The outputs of the four CNNs are also merged into a binary output matrix as we do in Section 4. 6. Experimental results Philips Semiconductor provided 386 actual wafer samples with and without defect clusters for testing in this study. These wafer maps were investigated using the median filter, the self-supervised multilayer perceptrons, and the CNN and GA combinations, respectively. The learning-rate parameter h/C used in Eqs. (4) and (5) is set to 0.2 throughout the experiments, while the mutation probability = 0.005, and crossover probability = 1 during the optimum parameters of cloning templates selection process. This work calculates the average false-negative rate and falsepositive rate of the three algorithms on the 386 wafer samples, where the false-negative rate indicates the rate at which the system fails to detect an error pattern, and the false-positive rate represents the probability that a pattern is wrongly detected as an error. Figs. 10–19 display the experimental results of applying three techniques on four wafer map samples. Each figure is divided into three subfigures, the subfigure on the left illustrates the original wafer map, that in the middle displays the result following one of the two techniques is used, and the subfigure on the right shows the result of the marking process. Notably, the two wafer samples given in Figs. 10–15 present two examples of invisible defects caused by leakage current and shorter effective channel length, respectively; whereas the wafer sample given in

In [38], measures of fuzziness such as index of area coverage and entropy are used to effectively extract object features. Entropy of a fuzzy set defined by De Luca and Termini [39] is given by

eðAÞ ¼ 

1 X ½m ðaÞ log2 mA ðaÞ þ ð1  mA ðaÞÞ log2 ð1 jXj a 2 A A

 mA ðaÞÞ;

(14)

where mA(a) denotes the membership function of a fuzzy set A, jXj represents the cardinality of the universal set X. Index of area coverage of a fuzzy set is defined as [38] P Aðx; yÞ P  uðAÞ ¼ (15) P ; Aðx; yÞ  maxy maxx y x Aðx; yÞ where A() denotes a fuzzy representation of an image, the leading term in the denominator gives the longest expansion of an image in the y direction, and the second term represents the longest expansion in the x direction. Entropy of an image provides the degree of ambiguity in deciding whether a pixel would be treated as black or white, and index of area coverage reflects the amount of fuzziness in shape and spatial domain of an image. Accordingly, the minimization of the two ambiguity measures can be employed as the components of the fitness function of the GAs to determine the proper of values of the four cloning templates:

G ðAÞ ¼

1

eðAÞ  uðAÞ

:

Fig. 10. Defect cluster detection with median filter, sample 1: original wafer map (a), median filter output (b), and the detected defect clusters (c).

Fig. 11. Defect cluster detection with multilayer perceptrons, sample 1: original wafer map (a), multilayer perceptron output (b), and the detected defect clusters (c).

(16)

To apply the GAs herein, we generate nine random strings of 0 s and 1 s each of length six corresponding to the nine parameters of each cloning template. Nine strings are concatenated to form a chromosomal representation of the parameter set. Ten chromosomes are generated for each cloning template to form the initial pool. The applications of the GAs with fuzzy fitness function G() as given in Eq. (16) proceed for 50 iterations. The assignment of the

Fig. 12. Defect cluster detection with CNN/GA, sample 1: original wafer map (a), CNN/GA output (b), and the detected defect clusters (c).

C.-J. Huang et al. / Applied Soft Computing 9 (2009) 824–832

831

Fig. 13. Defect cluster detection with median filter, sample 2: original wafer map (a), median filter output (b), and the detected defect clusters (c).

Fig. 18. Defect cluster detection with CNN/GA, sample 3: original wafer map (a), CNN/GA output (b), and the detected defect clusters (c).

Fig. 14. Defect cluster detection with multilayer perceptrons, sample 2: original wafer map (a), multilayer perceptron output (b), and the detected defect clusters (c).

Fig. 19. Defect cluster detection on a wafer with no defect cluster: original wafer map (a), output of three algorithms (b), and the empty detect defect map (c).

Figs. 16–18 exhibits scratches on the wafer surface caused by equipment malfunctions or human carelessness. Figs. 10–15 indicate that that the CNN and GA combinations can detect more fragments of the fractured defects than either the selfsupervised multilayer perceptrons or the median filter approaches. Furthermore, the marking of eight-adjacent neighbors and the isolated interior dies increases the scopes of the defects and reinforces the solidity of the defects.

Table 1 Error probabilities of three algorithms.

Fig. 15. Defect cluster detection with CNN/GA, sample 2: original wafer map (a), CNN/GA output (b), and the detected defect clusters (c).

Fig. 16. Defect cluster detection with median filter, sample 3: original wafer map (a), median filter output (b), and the detected defect clusters (c).

Fig. 17. Defect cluster detection with multilayer perceptrons, sample 3: original wafer map (a), multilayer perceptron output (b), and the detected defect clusters (c).

Algorithm

False-negative rate

False-positive rate

Median filter Self-supervised multilayer perceptrons CNN/GA combinations

4.3% 1.5% 0.8%

1.1% 2.4% 2.3%

The third sample illustrated in Figs. 16–18 provides an example where the self-supervised multilayer perceptrons and the CNN/GA approach captures some defect clusters while the median filter fails to recognize. Fig. 19 illustrates a typical result for each of three algorithms when applied to a wafer sample with no clustered defects. The three algorithms treat a cluster as noise when it is too small, and thus do not mistake isolated defective dies for defect clusters. As illustrated above, the CNN/GA combinations achieve better performance in recognizing bad and suspect dies, and labeling dies located in flawed areas than the self-supervised multilayer perceptrons and the median filter approach. Table 1 compares the false-negative and false-positive rates for the three algorithms. The table clearly illustrates that both of the CNN/GA combinations and self-supervised multilayer perceptrons achieve a significantly lower false-positive rate than and the median filter approach, and sacrifice an additional 1.2% yield loss compared to the median filter algorithm. 7. Conclusion Although defect clusters on the wafers are easily recognized, support testers require efficient software to identify the defect clusters. This work presents two algorithms, the self-supervised multilayer perceptron approach and the cellular neural network and genetic algorithm combinations, respectively, which coordinate with clustering and linking techniques for detecting the defect clusters on the wafer. The tool implements either of the algorithms immediately following wafer probing because failed dies marked on the wafer map file provide assistance in locating the flawed areas. Incorporating these machine learning techniques during the testing stage eliminates manual operation and enables more automated testing. Both algorithms are compared with that

832

C.-J. Huang et al. / Applied Soft Computing 9 (2009) 824–832

presented in our earlier work, which is currently used by a wafer testing factory. Test results confirm that both algorithms detect the majority of the defect clusters. Both approaches can obtain more fragments in the individual defect cluster and can even detect some defects that our earlier work failed to capture. However, the proposed algorithms misjudge an average of additional 1.2% good dies as bad dies compared to our earlier work. The increased yield loss is acceptable according to the testing factory because customer complaints regarding bad IC products are intolerable and thus superior quality chips are required in certain industries. Future research will focus on incorporating other intelligent tools such as neuro-fuzzy approaches or support vector machines into the proposed algorithm to reduce the yield loss. Acknowledgments The authors would like to thank the National Science Council of the Republic of China, Taiwan for financially supporting this research under contract numbers NSC 96-2628-E-259-022-MY3 and NSC 97-2218-E-259-005. Our gratitude also goes to the Academic Paper Editing Clinic, NHLUE. References [1] A. Ghosh, N. Pal, S. Pal, Self-organization for object extraction using a multilayer neural network and fuzziness measures, IEEE Transactions on Fuzzy Systems 1 (1) (1993) 54–68. [2] S. Pal, D. Bhandari, Genetic algorithms with fuzzy fitness function for object extraction using cellular networks, Fuzzy Sets and Systems 65 (1994) 129–139. [3] F.-L. Chen, S.-F. Liu, A neural-network approach to recognize defect spatial pattern in semiconductor fabrication, IEEE Transactions on Semiconductor Manufacturing 13 (3) (2000) 366–373. [4] K. Kameyama, Y. Kosugi, T. Okahashi, M. Izumita, Automatic defect classification in visual inspection of semiconductors using neural networks, IEICE Transactions on Information & Systems E81-D (11) (1998) 1261–1271. [5] C.-Y. Wu, C.-H. Cheng, A learnable cellular neural network structure with ratio memory for image processing, IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 49 (12) (2002) 1713–1723. [6] E. Ohta, Y. Mitsukura, M. Fukumi, A method to extract liver tumors in CT images using genetic algorithms and neural networks, in: 2001 IEEE International Conference on Systems, Man, and Cybernetics, 2001, 801–805. [7] A. Ghosh, N.R. Pal, S.K. Pal, Modeling of component failure in neural networks for robustness evaluation: an application to object extraction, IEEE Transactions on Neural Networks 6 (3) (1995) 648–656. [8] J. Kim, T. Chen, Combining static and dynamic features using neural networks and edge fusion for video object extraction, IEE Proceedings-Vision, Image and Signal Processing 150 (3) (2003) 160–167. [9] S.-W. Hwang, E.Y. Kim, S.H. Park, H.-J. Kim, Object extraction and tracking using genetic algorithms, in: 2001 International Conference on Image Processing, vol. 2, 2001, 383–386. [10] X.-P. Yang, T. Yang, L.-B. Yang, Extracting focused object from defocused background using cellular neural networks, in: Third IEEE International Workshop on Cellular Neural Networks and their Applications, 1994, 451–455. [11] S.M.R. Hasan, S.-C. Chen, A new VLSI architecture for perceptron network, 1996 IEEE TENCON. Digital Signal Processing Applications 1 (26–29) (1996) 352–357. [12] Y.-H. Choi, D.-J. Chung, VLSI processor of parallel genetic algorithm, in: The Second IEEE Asia Pacific Conference on ASIC, 2000, 143–146. [13] C.-Y. Wu, W.-C. Yen, A new compact neuron-bipolar junction transistor (nBJT) cellular neural network (CNN) structure with programmable large neighborhood symmetric templates for image processing, IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 48 (1) (2001) 12–27.

[14] C.-J. Huang, C.-C. Wang, C.-F. Wu, Image processing techniques for wafer defect cluster identification, IEEE Design & Test of Computers 19 (2) (2002) 44–48. [15] Z. Zeng, S. Dai, P. Mu, Wafer defects detecting and classifying system based on machine vision, in: The Eighth International Conference on Electronic Measurement and Instruments, 2007, 4-520–4-523. [16] C.-Y. Chang, C.-H. Chang, C.-H. Li, M.-D. Jeng, Learning vector quantization neural networks for LED wafer defect inspection, in: Second International Conference on Innovative Computing, Information and Control, vol. 11, no. 3, 2007, 229–232. [17] K.P. White, B. Kundu, C.M. Mastrangelo, Classification of defect clusters on semiconductor wafers via the Hough transformation, IEEE Transactions on Semiconductor Manufacturing 21 (2) (2008) 272–278. [18] S. Baluja, R. Maxion, Artificial neural network based detection and diagnosis of plasma-etch anomalies, Journal of Intelligent Systems 7 (1/2) (1997) 57–82. [19] M. Taubenlatt, J. Batchelder, Patterned wafer inspection using spatial filtering for cluster environment, Applied Optics 31 (17) (1992) 3354–3362. [20] C.-H. Wang, Recognition of semiconductor defect patterns using spectral clustering, in: IEEE International Conference on Industrial Engineering and Engineering Management, 2007, 587–591. [21] M. Nikoonahad, C. Wayman, S. Biellak, Defect detection algorithm for wafer inspection based on laser scanning, IEEE Transactions on Semiconductor Manufacturing 10 (4) (1997) 459–468. [22] K. Maruo, T. Shibata, T. Yamaguchi, M. Ichikawa, T. Ohmi, Automatic defect pattern detection on LSI wafers using image processing techniques, IEICE Transactions on Electronics E82-C (6) (1999) 1003–1012. [23] A. Vacca, B. Eynon, S. Yeomans, Improving wafer yields at low k1 with advanced photomask defect detection, Solid State Technology (1998) 185–190. [24] C. Hess, L. Weiland, Extraction of wafer-level defect density distribution to improve yield prediction, IEEE Transactions on Semiconductor Manufacturing 12 (2) (1999) 175–183. [25] A. Shapiro, Automatic classification of wafer defects: status and industry needs, IEEE Transactions on Components, Packaging, and Manufacturing Technology Part C 20 (2) (1997) 164–167. [26] J.R.D. Debord, N. Sridhar, Yield learning and process optimization on 65-nm CMOS technology accelerated by the use of short flow test die, IEEE Transactions on Semiconductor Manufacturing 20 (3) (2007) 201–207. [27] B. Jahne, Digital Image Processing: Concepts, Algorithms, and Scientific Applications, Springer-Verlag, Berlin, 1997. [28] M.D. Emmerson, R.I. Damper, Determining and improving the fault tolerance of multilayer perceptrons in a pattern-recognition application, IEEE Transactions on Neural Networks 4 (5) (1993) 788–793. [29] S. Pal, S. Mitra, P. Mitra, Rough-fuzzy MLP: modular evolution, rule generation, and evaluation, IEEE Transactions on Knowledge and Data Engineering 15 (1) (2003) 54–68. [30] K. Woods, K.W. Bowyer, Generating ROC curves for artificial neural networks, IEEE Transactions on Medical Imaging 16 (3) (1997) 329–337. [31] S.-C.B. Lo, H. Li, Y. Wang, L. Kinnard, M.T. Freedman, A multiple circular path convolution neural network system for detection of mammographic masses, IEEE Transactions on Medical Imaging 21 (2) (2002) 150–158. [32] M.R. Azimi-Sadjadi, D. Yao, Q. Huang, G.J. Dobeck, Underwater target classification using wavelet packets and neural networks, IEEE Transactions on Neural Networks 11 (3) (2000) 784–794. [33] G. Baxes, Digital Image Processing: Principles and Applications, John Wiley & Sons, New York, 1994. [34] J. Principe, N. Euliano, W. Lefebvre, Neural and Adaptive Systems: Fundamentals Through Simulations, John Wiley & Sons Inc., 2000. [35] J.-M. Rouet, J.-J. Jacq, C. Roux, Genetic algorithms for a robust 3-D MR-CT registration, IEEE Transactions on Information Technology in Biomedicine 4 (2) (2000) 126–136. [36] H. Yao, L. Tian, A genetic-algorithm-based selective principal component analysis (GA-SPCA) method for high-dimensional data feature extraction, IEEE Transactions on Geoscience and Remote Sensing 41 (6) (2003) 1469–1478. [37] Z.-J. Lee, S.-F. Su, C.-C. Chuang, K.-H. Liu, Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment, Applied Soft Computing 8 (1) (2008) 55–78. [38] S. Pal, A. Ghosh, Fuzzy geometry in image analysis, Fuzzy Sets and Systems 48 (1992) 23–40. [39] A. De Luca, S. Termini, A definition of a non-probabilistic entropy in the setting of fuzzy sets theory, Information and Control 20 (1972) 301–312.