Neural net analysis of integrated circuit yield dependence on CMOS process control parameters

Neural net analysis of integrated circuit yield dependence on CMOS process control parameters

Microelectronics Reliability 43 (2003) 117–121 www.elsevier.com/locate/microrel Neural net analysis of integrated circuit yield dependence on CMOS pr...

199KB Sizes 0 Downloads 106 Views

Microelectronics Reliability 43 (2003) 117–121 www.elsevier.com/locate/microrel

Neural net analysis of integrated circuit yield dependence on CMOS process control parameters M. Karilahti

*

Optoelectronics Laboratory, Helsinki University of Technology, P.O. Box 3000, FIN-02015 TKK Espoo, Finland Received 14 September 2001; received in revised form 6 June 2002

Abstract In this practical case study the method of self-organizing map (SOM) neural net is applied to analyze a CMOS process problem, where the device under study is a heartbeat rate monitor integrated circuit. The wafer yield is analyzed against the process control monitoring (PCM) parameter measurement values. The SOM efficiently reduces the parameter space dimensions and helps in visualizing the different parameter relations. This makes it possible to identify the most probable PCM parameters affecting the yield. Those were found out to be NMOS transistor drain current and aluminium sheet resistance. Ó 2002 Elsevier Science Ltd. All rights reserved.

1. Introduction

2. Simplified principle of the self-organizing map

The device yield factor is extremely important to semiconductor fabrication facilities (fab), as this directly translates into savings or costs. Various neural nets, including self-organizing maps (SOM) [1], have successfully been applied in the field of integrated circuit (IC) design modeling for yield optimization [2], spatial IC and wafer failure pattern analysis [3,4], quality control [5], semiconductor process modeling [6–8], and functional yield and process control monitoring (PCM) analysis [9–11]. The main objective here is to analyze the yield dependence on various electrically and optically performed PCM measurements by using the SOM to identify the main factors for low yield of a heartbeat rate monitor IC device processed in a semiconductor fab. The high-dimensional parameter data probably contains non-linear dependencies, and ordinary linear regression methods will not be sufficient. SOM have shown their usefulness analyzing yield and PCM data [9–11].

Fig. 1 shows the structure of a 3  4 sized SOM, thus having 12 neurons in each of the parameter (component) planes. The n-dimensional parameter space, which is represented by 12 map vectors mi ¼ fA; B; C; D; E; F; G; H; I; J; K; Lg where the components of the vectors A; . . . ; L range from 1 to n, representing the total map size of 3  4  n neurons. Each of these neurons initially contains a random value, and the map will be trained with q measurements, of which each one is n-dimensional, each dimension representing one measured parameter, and which are grouped together into a training data vector x. Training makes the map represent the measured data set more accurately, i.e. there will always be a single map vector, one of mi , whose distance jmi  xj is the minimum (best match) and is denoted as mi . The training is done by continuously sequencing through the set of measurement data vectors from 1 to q, finding the best matching vector mi from the map, and modifying its vector component values towards those of the training vector. Additionally, a small neighborhood around the best matching vector mi on the map is also modified towards the sample vector. All this is usually formulated for each i within the neighborhood as equation:

*

Corresponding author. Tel. +358-9-4111-0257 (direct)/ +358-9-4511 (switchboard); fax: +358-9-4513-128. E-mail address: mika.karilahti@hut.fi (M. Karilahti).

mi ðt þ 1Þ ¼ mi ðtÞ þ ai ðtÞ½xðtÞ  mi ðtÞ

0026-2714/02/$ - see front matter Ó 2002 Elsevier Science Ltd. All rights reserved. PII: S 0 0 2 6 - 2 7 1 4 ( 0 2 ) 0 0 2 7 7 - 9

ð1Þ

118

M. Karilahti / Microelectronics Reliability 43 (2003) 117–121

Fig. 1. Component planes of a 3  4 SOM, composed of ndimensional vector space, representing a total of 3  4  n neurons.

for one sequence, where ai is the neighborhood kernel (or weight function, sometimes simplified to be the learning step size) decreasing from 1 down to 0 along with both increasing discrete time index t, and neighborhood distance on the map from the best matching vector mi (e.g., in Fig. 1, E being the best matching vector, H is nearer in its neighborhood than KÞ. After performing a sufficient number of training runs using the measurement vectors, the resulting map is assessed. One method of evaluating the quality of the map is to compute the average quantization error (expectation value) over the input samples, Efjx  mi ðxÞjg, where mi is the best matching map vector from the map for each measurement data vector x. The idea and principle of the SOM are extensively handled in KohonenÕs book [1].

3. Analysis The PCM data was collected from 17 production lots from a BiCMOS process during five months, and com-

bined with the wafer yield and lot number of information. Usually only a small number of the produced wafers are tested for PCM due to customer delivery urgence. The data consists of 202 measured wafers, each with five test dices, accounting for a total of 1010 rows by 63 parameters (61 PCM and wafer yield and the lot number), e.g. transistor threshold and breakdown voltages, thin-film sheet resistances, gate linewidths, leakage currents, etc. The measurements were first checked for missing data and outliers, which were replaced by markers to instruct the SOM software [12] to ignore them. The method of SOM was then applied to reduce the high-dimensional data and create visibility for the parameter dependencies. The training vectors should ideally be normalized not to give any single parameter plane too much of weight when training the map. However, caution should be used when doing the normalization, as one would have to know some background information about the parameters, which are ranging from nA to lm to V and kX. For example, the current gain factor (about 500) should probably be considered at least as important as the transistor threshold voltage (about 0.55 V). Other parameters exhibit a similar range variation. After careful consideration, each parameter was scaled by a simple division computation into the range of )20 to 0, or 0 to þ20 depending on the parameter polarity. The yield was scaled to range from 0 to 100 to make it have more emphasis and control over the map convergence. Also, the lot numbers, carried along, were scaled to range from 0 to 0.1 in order to create a lowweighing tag to later allow lot region identification. Decision about the map size to be used is by no means a simple choice. Ref. [13] lists several studies using various map sizes and component plane dimensions. As an example, the number of training vectors used in Ref. [14] was 208, while the map size was chosen as 10  10 neurons with eight planes. Intuitively, a large number of parameter planes can make it difficult for the map to attain independently trained category regions on the different planes, e.g. if the complex parameter space contains several co-dependent parameter planes, where the vectors are forced into the same region because of their dependency of such a parameter that is directing the map formation. Additionally, it is possible that some planes contain passenger nodes, which are not actually contributing to the map region classification. Here, the SOM should now perform two functions: (1) the map size should be small, and still converge to represent the data set accurately, (2) the map should be large enough to reveal subtle relations between the parameters. While not directly translatable from the previous studies, the map size was chosen as 17 by 22, a total of 374 neurons. It is customary to leave randomly

M. Karilahti / Microelectronics Reliability 43 (2003) 117–121

119

Table 1 Algorithm for obtaining the best map (bubble neighborhood, hexa topology, linear alpha shrink, random initialization) Order

Training cycles

Alpha (neighborhood shrink)

Initial radius (neurons)

Average quantization error

1 2 3 4

0 10,000 100,000 100,000

N/A 0.9 0.3 0.2

N/A 10 5 0

30.247436 5.057104 4.257122 2.718484

chosen 10% of the training vectors out of the training material, and use them for testing the map modeling performance. The main objective is to provide the semiconductor engineer with the most probable parameter candidates for the low yield, therefore it should be reasonably safe to use all the measurement vectors for training the map, as we are low on samples already. The map was initialized with random values.

4. Results After the training of the SOM, the average quantization error of the map per sample vector was found to be 2.72. Several algorithm variations were tested, and the one described in Table 1 was found to result in the best performance. The quantization error is dependent on the map size, the initial values, and the training data. Table 2 lists the results of experimenting done on various sized maps. Additionally, Table 3 lists all the used parameters and the quantization errors of those parameter planes per sample. It is interesting to note that the average quantization error computed from the separate planes adds up to 2.03 under the assumption of the planes being independent, and using the squared summing. The aim in this study is to use the SOM in classification of PCM parameters, to make it possible to identify yield-affecting factors, which the semiconductor Table 2 Map size and the average quantization error per sample vector Map size (neurons)

Average quantization error

4  6 ¼ 24 6  6 ¼ 26 10  10 ¼ 100 10  12 ¼ 120 5  24 ¼ 120 17  22 ¼ 374 20  20 ¼ 400 10  40 ¼ 400 33  33 ¼ 999 10  100 ¼ 1000 20  50 ¼ 1000 40  40 ¼ 1600 100  100 ¼ 10; 000

6.695118 5.841517 4.379018 4.087817 4.126108 2:718484 2.658164 2.706104 1.324180 1.468767 1.423114 0.866286 0.000005

Table 3 Parameter (component) plane number, short name, measuring unit, and the average quantization error of the plane per sample vector (computed after vector scalings and map training) Plane number

Parameter short name

Unit

Average quantization error of the plane per sample

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

NPNs Beta NPNs Vbe PNPv Beta PNPv Vbe PNPv Va PNPv1 Beta PNPv1 Vbe PNPv1 Va Zener V N Vt0 N gm N dL N Id N BrkV P Vt0 P gm P dL P Id P BrkV Nþ sheet Pþ sheet P sheet Osc PI sheet NI sheet Ga sheet Ga W Al/Nþ contact Al leakSt Ga leakSub Ga leakSt Ga leak Ga/P Cap GaOx leak Al ResSt N sheet Al/Pþ contact Al/Met2 contact Al/Met2 Cap Al/Met2C [1/1000Q] Al/Met2 Cap leak

– V – V V – V V V V lS lm lA V V lS lm lA V ohm/sq ohm/sq ohm/sq MHz ohm/sq ohm/sq ohm/sq lm ohm nA nA nA nA pF nA ohm ohm/sq ohm ohm pF –

0.141791 0.045573 0.299748 0.020411 0.480155 0.026124 0.000938 0.047988 0.045522 0.167227 0.181811 0.211140 0.178111 0.038615 0.206808 0.181845 0.181000 0.192159 0.109901 0.488481 0.179430 0.085157 0.183008 0.095096 0.139447 0.153025 0.160813 0.335363 0.200124 0.141285 0.132956 0.244282 0.070831 0.023394 0.215163 0.314740 0.432942 0.190028 0.135301 0.320286

41

nA

0.037974 (continued on next page)

120

M. Karilahti / Microelectronics Reliability 43 (2003) 117–121

Table 3 (continued) Plane number

Parameter short name

Unit

Average quantization error of the plane per sample

42

Al/Met2 Cap BrkV Film leak Film sheet 10k Film dL 10k Nj Vt0 Nj gm Nj dL Nj Id Nj G Pj Vt0 Pj Id Pj G Pj gm Pj dL BI sheet BI pin sheet NH gm NH Leff NH id NH BrkV Yield per wafer Lot number

V

0.118056

nA ohm/sq lm V lS lm lA V V lA V lS lm ohm/sq ohm/sq lS lm mA V Dices –

0.167579 0.376814 0.649783 0.160367 0.161704 0.715292 0.156107 0.086774 0.202926 0.180422 0.175302 0.174625 0.865074 0.041280 0.239567 0.154364 0.127523 0.095527 0.136662 0.384938 0.000181

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

engineer can experiment on and verify in the fab, e.g. to run experiment lots with parameter variations. Therefore, the metrics of the map is chosen to keep simple. In order to better understand the process parameter value relations, the map planes could be vectorized and run through a new SOM to see which planes become located closely together [15]. This might reveal process steps that cause co-dependence for certain parameters. To find out which parameters have a critical effect on the yield, all 63 map planes were visually inspected and compared to the yield plane Nr. 62. Fig. 2 shows (a) the yield (yield per wafer in Table 3) from the plane Nr. 62, (b) NMOS transistor drain current (Nj Id in Table 3) from the plane Nr. 49, (c) aluminium sheet resistance (Al ResSt in Table 3) from the plane Nr. 35, (d) metal layer 1 to metal layer 2 contact resistance (Al/Met2 contact in Table 3) from the plane Nr. 38, and (e) lot numbers from the plane Nr. 63. In Fig. 2 the dark gray areas indicate a small parameter value, and light gray a large value. The yield in Fig. 2(a) is very low in the upper left corner and high in the upper right and lower left regions. The most likely parameter affecting the yield is the NMOS transistor drain current in Fig. 2(b), where the low current is located in the same region as the low yield in Fig. 2(a). Most of the discovered parameters have to do with NMOS transistors, and would need further investiga-

Fig. 2. Component planes of the trained 17  22 SOM, where dark gray relate to a small parameter value, light gray to a large value. (a) Device yield, (b) NMOS transistor drain current, (c) aluminium sheet resistance, (d) metal layer 1 to metal layer 2 contact resistance and (e) production lot numbers.

tion. The fab uses N-type wafers, and therefore the NMOS transistors have to be placed in a p-well, which process step might introduce additional impurities and cause bias to the operational characterisitics of the NMOS transistors when compared to PMOS transistors. The map plane of aluminium sheet resistance in Fig. 2(c) matches the good yield area, when the resistance value is in the region of intermediate range, i.e. the yield is not at its best if the aluminium sheet resistance is too small or too large. This result would not necessarily have been discovered with the procedure of re-applying the map to automatically categorize the SOM planes [15]. Also, the very low yield area in Fig. 2(a) seems to be related to the very high aluminium sheet resistance area in Fig. 2(c), an inversed relation, not detectable auto-

M. Karilahti / Microelectronics Reliability 43 (2003) 117–121

matically. The metal layer 1 to metal layer 2 contact resistance SOM plane in Fig. 2(d) is shown only for a reference, as one cannot see there any relation to the yield. Fig. 2(e) contains the lot numbers, and interesting enough, the low yield apparently is a problem for the latest production lots, i.e. larger lot number, especially in the upper left corner.

5. Conclusion The results show that the SOM type of neural net can effectively be applied to identify semiconductor process parameter relations. In this case the most likely causes for low yield turned out to be NMOS transistor drain current and aluminium sheet resistance. Supplied with this information the semiconductor engineer can plan lot run tests for performing process parameter variation to verify the findings. Apparently the SOM could also be utilized when transferring a process from one fab to another. Its use would speed up the process qualification by identifying the process parameters requiring attention. Similarly the SOM could be used for solving process problems for ICs in pre-production stage, in which there would already exist sufficiently data for the analysis. Eventhough the SOM has not yet been widely adopted for everyday use in the semiconductor industry, this practical case study shows that the method is extremely efficient for analyzing certain semiconductor problems.

Acknowledgements The author would like to thank Professor Turkka Tuomi and Professor Olli Simula at the Helsinki University of Technology for their help on preparing this article.

References [1] Kohonen T. Self-organizing maps. Berlin: Springer; 1995 [second extended edition 1997]. [2] Ilumoka AA. A modular neural network approach to microelectronic circuit yield optimization. Microelectron Reliab 1998;38:571–80.

121

[3] Collica RS, Card JP, Martin W. SRAM bitmap shape recognition and sorting using neural networks. IEEE Trans Semicond Manuf 1995;8:326–32. [4] Chen F-E, Liu S-F. A neural-network approach to recognize defect spatial pattern in semiconductor fabrication. IEEE Trans Semicond Manuf 2000;13:366–73. [5] St€ utzle T. A neural network approach to quality control charts. In: Proceedings of the International Workshop on Artificial Neural Networks (IWANNÕ95), Torremolinos, Spain. In: Mira J, Sandoval F, editors. Lecture Notes in Computer Science, vol. 930. Springer Verlag; 1995. p.1135– 41. [6] Rietman EA, Lory ER. Use of neural networks in modeling semiconductor manufacturing processes: an example for plasma etch modeling. IEEE Trans Semicond Manuf 1993;6:343–7. [7] Bose C, Lord H. Neural network models in wafer fabrication. In: SPIE Proceedings of Applications of Artificial Neural Networks 1965. 1993. p. 521–30. [8] Marks KM, Goser K. Analysis of VLSI process data based on self-organizing feature maps. In: Proceedings of NeuroNimes, France, 15–17 November 1988. p. 337–48. [9] Gardner M, Bieker J. Data mining solves tough semiconductor manufacturing problems. KDD 2000, Boston, 2000. p. 376–83. [10] Ludwig L, Epperlein U, Kuge HH, Federl P, Koppenhfer B, Rosenstiel W. Classification of ÔfingerprintsÕ of process control monitoring-data with self-organizing maps. In: Proceedings of EANN Õ97, Stockholm, Sweden. p. 107–11. [11] Ludwig L, Pelz E, Kessler M, Sinderhauf W, Koppenhoefer B, Rosenstiel W. Prediction of functional yield of chips in semiconductor industry applications. In: Proceedings of EANN Õ98, Gibraltar, UK. Turku, Finland, 1998. p. 157– 61. [12] SOM_PAK software, SOM Programming Team of the Helsinki University of Technology, Laboratory of Computer and Information Science. Available at: ftp:// www.cis.hut.fi/pub/som_pak/. [13] Myklebust G, Solheim JG. Neural Networks, 1995. In: Proceedings, IEEE International Conference, vol. 2, 1995. p. 1054–9. [14] Luo X, Singh C, Patton AD. Loss-of-load state identification using self-organizing map. In: Power Engineering Society Summer Meeting, 1999, vol. 2. London: IEEE; 1999. p. 670–5. [15] Vesanto J, Ahola J. Hunting for correlations in data using the self-organizing map. In: International ICSC Symposium on Advances in Intelligent Data Analysis, Proceedings of International (ICSC) Congress on Computational Intelligence Methods and Applications (CIMAÕ99), Rochester, New York, USA, 22–25 June. ICSC Academic Press; 1999. p. 279–85.