A model for IC yield analysis and process control

A model for IC yield analysis and process control

Solid-State Ekcrronics Vol. 29, No. 12, pp. 1253-1265, 1986 Printed in Great Britain A MODEL 0038-I 101/86 $3.00 + 0.00 Pergamon Journals Ltd FOR I...

1MB Sizes 0 Downloads 72 Views

Solid-State Ekcrronics Vol. 29, No. 12, pp. 1253-1265, 1986 Printed in Great Britain

A MODEL

0038-I 101/86 $3.00 + 0.00 Pergamon Journals Ltd

FOR IC YIELD ANALYSIS PROCESS CONTROL

AND

LEON L. PESOTCHINSKY Vitaphore. Consulting Group, P.O. Box 700070, San Jose, CA 95170, U.S.A. (Received

8 December

1984; in revised form 23 July 1985)

Abstract-A probabilistic model for an analysis of IC wafer yield is proposed. The model tends to separate various causes of yield degradation, in particular the effects of gross wafer variability related to processing conditions, systematic errors and random spatial yield patterns. The results can also he used for estimating parameters of various yield vs chip area models, of which the “Gross Defect Area” model was chosen as an example. The model can be used for comparing data on different lots of the same product, for routine process control, for data reduction and spatial analysis, and for various other research applications.

1. INTRODUCTION It is well known that, in addition to errors in processing of integrated circuits, the circuits are subject to a variety of local defects which may be due to particles in resist, dust on a mask, or a multitude of other faults and hazards. It is hard to classify the error types, but without getting into much detail we may group the defects as caused by one or more of the following: (1) process parameters under operator’s control which affect all the dice on a wafer in a similar manner (this group may include deviations of the process parameters off-target values, changes in the environmental conditions in the fab areas, etc; such causes frequently result in random pattern of functional dice with some of device parameters outside of the specifications); (2) same as above, but this time affecting only specific parts of a wafer (misalignment-caused errors, etc.); (3) parameters not under operator’s control (such as design flaws, defects on masks, etc.); in the following we will frequently use the notion of a “systematic error” to describe the defects which cause a consistent yield loss at specific locations (such as, say, mask “killer” defects); (4) random local defects. For process control goals defects of the first two groups are the most important. Defects of the third group are of greater concern to the product and design engineers, and the last group may indicate yield limitations as functions of chip size, complexity, etc. It was this last subject which facilitated extensive research and discussion in the area of spatial yield modeling and prediction (see [2-231); a good portion of a disagreement between different authors being attributed to a failure to separate causes of yield loss. Introduction of the “area usage” factor was an

important step towards the solution of this problem (see [4] and others), and some fine points in implementing this notion may be resolved with the help of the model proposed in this paper. To introduce the model we consider four “wafers” in the upper part of Fig. 1 (1 and 0 denote, respectively, yielding and non-yielding die locations). One can notice that locations 1, 3, 4, 12, 13 never yielded, and thus there is good reason to assume that these failures were due to some systematic defect mode, most likely not under direct operator’s control. Each of the other locations yielded 2 times out of 4, but it should not serve as an indication that the process was homogeneous, and that the frequency of random defects was about the same. Indeed, one can see that the yield of the wafers 1 and 3 was twice as large as that of 2 and 4, which may indicate a difference in process conditions. To show that this type of argument may be valid even in a more realistic situation with large numbers of dice per wafer, we can imagine that each location represents a composite yield for a large group of dice, not necessarily neighboring on each other (e.g. location 1 may stay for all non-yielding “peripheral” chips, etc.). The above discussion demonstrates that we need a method (or a model) which on the basis of some readily available data (such as IC failure maps) would quantitatively separate the effects of “process conditions” (or process control parameters) on a wafer yield. In particular, in the previous example we would like to be able to find whether or not process conditions for the wafers 1 and 3 were indeed significantly better than those for the wafers 2 and 4. (We will clarify the notion of “process conditions” as used in this paper in the next section.) In applications we would be interested in comparing lots or batches rather than individual wafers. A related problem arises with the application of a “moving window” method for the yield vs IC area analysis [13, 16,221. The chip sizes are “magnified” by considering 2,3,4, etc. adjacent chips as one “super

1253

LEON L. PES~TCHINSKY

1254

\

/

Wafer stacking yield map

of

Process Wafer U

yield

probabilities pi

1

1.0

2

0.4

4

0.4

.3

Map of the location

Ordering the die locations

1.0

yield

probabilities

Fig. 1. Simplified example.

chip” with an area 2,3,4, etc. times larger than that of the original chip. The super chips “yield” only if all the component chips do, and have the same circuit density as the initial chip (the latter being a very convenient feature of this method, as noticed in [21]). If we consider a 2 x 2 window for the wafers from Fig. 1, we may easily see that none of the possible 2 x 2 super chips yields on any of the four wafers. At the same time, each of the locations 5,6,10, and 11 yields with some positive probability, and four of them form a 2 x 2 chip. Thus such a chip should have a positive yield probability despite the fact that the yield data suggest zero yield. This observation implies that we can modify the moving window method to evaluate the probability of the events which could occur but have not been observed either because of a small sample size, or for some other reasons. Naturally, in a large sample we are bound to observe virtually all the “probable” events. However, in our applications it seems hardly feasible to accumulate a large and homogeneous sample of wafers because many process steps are subject to time-related changes (mask cleaning or

replacement, different shifts, machines, operators, etc.). Thus we have to settle for smaller sample sizes where the problem persists, also because such samples are the only ones available for the ongoing process control. As it will be shown later, the proposed method not only deals with this problem, but also eliminates from the final result the yield variability “between the wafers”. The idea of the method is separation of sources of defects of the first group (under operator’s control and similarly affecting all the dice) from the ones causing the other three groups of the defects. It is done by assigning to each wafer a coefficient which characterizes the gross wafer variability related to “processing conditions” (called “process yield probability”), and to each die location-a coefficient called a “location yield probability”. It will be easy then to detect systematic defects (defined earlier) by looking at the zero yield locations (it also could be done by building a wafer stacking map). The process/ location defects (such as the ones caused by misalignment which affects only some specific locations), as defects of the second group in general, should be

A model for IC yield analysis and process control identified on the basis of summarized yield loss patterns, using prior experience with these kinds of problems. The changes in the process yield probabilities are believed to indicate changes in the individual wafer processing conditions, such as position in a waferboat, accidental damage, etc. At the same time these values can be viewed as sample measurements for lots or batches of wafers, and thus may serve for the purposes of a lot-to-lot comparison. The location yield probabilities presented in the form of yield maps can be used for spatial yield modeling, and they provide “cleaner” data since wafer-to-wafer variability has been mostly taken care of by the process (wafer) yield probabilities. In the next section the method and corresponding model are described in more detail. In Sections 3 and 4 applications to a spatial yield modeling are considered, and some examples are given. Numerical and statistical aspects of the method are presented in the Appendix. 11. MODEL

DESCRIPTION AND DlSCUSSlON

In the following, M denotes the number of wafers, and N stands for the number of chips on a wafer. A4 does not have to be a large number for the presented method, and meaningful results may be obtained for M as small as four. Let pi denote the process yield probability for the wafer number i (we will sometimes stress it by calling pi “wafer yield probability”), and let qj stand for the jth location yield probability. We expect that under the “ideal” processing conditions we will have pi = 1, and vice versa, pi = 0 if the ith wafer was subjected to a gross operator’s error. Similarly, qj = 0 indicates a location of a systematic error (mask “killer” defect, etc.), and q, = 1 for the locations expected to yield all the time provided that the process conditions are “optimal” (p,s equal 1). We assume that the process and location contribute to the yield independently, so that thejth chip will yield on the ith wafer with probability pij =piqj. This seems to be a justifiable assumption since by definition pi is the same for all the dice on the ith wafer, and all the other causes of yield degradation are attributed to the locations. As was mentioned earlier, the latter may include some process-related errors (e.g. caused by occasional misalignment affecting some specific locations), but we hope that the patterns of such defects should be well known, and therefore the causes may be identified even though they are not “covered” by the process parameters pis. The lower part of Fig. 1 presents the results of the data analysis, obtained as described later with the help of the Maximum Likelihood estimator. As expected, the process yield probabilities for the wafers 1,3, and 2,4 are quite different (p, =pJ = 1 and p2 = p4 = 0.4) as well as the location yield probabilities: the analysis suggests that locations 8,9,14,

1255

and 17 would have yielded all the time if the process conditions were “ideal” for all wafers (that is, if all pis equal 1). Remaining locations have smaller yield probabilities (S/8); this may be explained by observing that they did not yield on either one of the two “ideally” processed wafers 1 and 3. The yield probability for a 2 x 2 super chip at locations 5,6,10, and 11 is computed as (5/8)4 = 0.153. The expected wafer yield (EY) under “ideal” (pi = 1) conditions can be computed by adding the qj values; in this case it is 5/8 x 8 + 1 x 4 = 9. (Remember that we assume in the computation of expected yield that the systematic errors remain through the process; of course, elimination of these errors further increases the value of expected wafer yield.) The q, map shows that the left half of the wafer tends to yield less than the right (again ignoring the systematic zero-yield locations); this cannot be observed on the wafer stacking yield map. Thus qj map helps to identify the problem which otherwise could hardly be detected even by a careful analysis of each individual wafer. This example shows the scope of questions which can be addressed with the help of the proposed probabilistic model. Of course, one should remember that the example is greatly simplified and serves just for illustrative purposes. It may be argued that a large sample of wafers may provide similar information using only primitive analysis. However, this may be true only if the sample is divided into relatively homogeneous subsamples so that within each of them the pis remain about the same. (This is required because the yield frequencies observed from stacking maps cannot be used as reliable estimates of location yield probabilities if wafer-to-wafer variability related to processing conditions has not been excluded.) Then each wafer stacking map element for each subsample will approximately equal multiples of qjs, and relative average yield for a subsample serves as a pi value. But such a method requires a more elaborate analysis (though not necessarily numerical) of each wafer, and sample sizes too big to correspond to a “uniform” process. In addition, it is nearly impossible to automate such a nonformal routine, whereas the procedure of obtaining estimates for the proposed model, yield maps, yield-area relationships, etc., has been automated and included into the standard IC failure map analysis. From a statistical point of view the difference between these two methods is explained by the fact that in the proposed model all M wafers are used to estimate q, values, whereas in the other method it has to be done separately for each of the subsamples, that is, on the basis of a smaller data set. The pi and qj values may be estimated using the Maximum Likelihood method (see [l]). The likelihood function for the jth die location on the ith wafer is defined as

1256

LEON L. PESTCHINSKY

where B0 is a yield indicator (e.g. eV= 1 if the chip yields and eii = 0 otherwise), and P, = pie. Multiplying L, through all the locations (i = 1,2, . . . , N) and wafers (i = 1,2, . . . , M) we obtain the likelihood function for our problem: L =

I”Ifi P$(l

-p&‘-O,

i=lj=l

= fi i

(piqj)o”(l -p,(li)‘-0’.

(1)

i=lj=l

( YJ‘

This function is a function of the parameters (probabilities) pi, qj, and of the observed yield indicators fIii. If the pis and qjs were known, then the likelihood function would be just a probability of the observed yield configuration. Thus when these values are not known we may estimate them by the values which maximize the likelihood function. This process is described in the Appendix along with some alternative estimators.

111.YIELD

VS AREA ANALYSIS

A comprehensive analysis of the most popular Yield-Area models was carried out in [17] on data from specially designed circuits. The study established that models with area usage factor provide a much better data fit than the models without this parameter, and that “Gross Defect Area” model based on area usage factor and on Poisson distribution of local defects was much better than most of the models and not worse than the remaining few. This model assumes that Yield = Yaurx exp( - DA),

points which would fit a line pattern much better. Additional argument in favor of this statement comes from the method by which yield probabilities are computed for a super chip. Indeed if we assume that non-yielding locations are scattered “randomly” all over the wafer area, then the yield probability for a chip of an area kA, (where A0 is a single chip area) is the probability of a simultaneous yield of kchips, which is

(2)

where D is (average) number of defects per unit area, A is a “critical active” die area, and Yaufstands for a value of area usage factor. There are some warranted reservations about using any of the spatial yield models for yield prediction purposes (e.g. in [21]); at best, they may provide a rough estimate for the die areas close to those included in the analysis, and for similar circuit densities. However, defect density D and YaUFmay be viewed as parameters limiting the yield of a particular product and process, and therefore their estimation (frequently done by moving window method) is of some practical value. It was noticed on various occasions [13,17,18] that the logs of data points obtained by this method (as well as actual data points in many cases) form a convex set, and not a line as it follows from the Gross Defect Area model. The phenomenon was attributed to yield variability between the wafers. In the method presented this variability is mostly “absorbed” by the pi values, whereas the spatial pattern is described by 4,s. Thus one may expect that the moving window method which determines yield probabilities for the super chips, rather than counts them on wafers as yielding or non-yielding, may provide a set of data

x exp( - kD&)

(3)

At the same time the GDA model projects the yield probability for such a super chip as Yaurx exp( -kDA,).

(4)

The last formula is valid under the assumption that Yaurreflects a consistently yielding part common to all wafers. Both formulae provide for loglinear yield models, but the first decreases much faster. In all likelihood there is enough validity in arguments leading to both (3) and (4), and the resulting curve is contained between these two lines. This also supports an argument that the curved nature of a yield-area data plot is not caused solely by wafer-to-wafer variability, and even for a single wafer we may expect a similar result because of difference between formulae (3) and (4) (see plots onFigs 5 and 6). Clearly, the data plot will demonstrate a more linear pattern when Yayris close to 1 or to 0 (since its power in these cases is also close to 1 or to 0). The computation of yield probabilities for our model is based only on spatial data condensed by the qj maps, and not on the assumptions of any analytical model (e.g. assigning some specific analytic spatial defect density), so that it eludes the difficulties mentioned above. The estimates based on our model correspond to the “optimal processing conditions” with pi = 1; this gives us reliable information about defect density and area usage factor, but may not be good for prediction of actual yields when p, values are unstable. However, in this latter case, one may wish to stabilize the process before making yield predictions. IV. EXAMPLES

Figures 2-8 illustrate some outputs of the Probabilistic Model program. Figure 2 presents an example somewhat opposite to the one in Fig. 1: there are eight wafers with the same yield of seven (out of 16) dice per wafer, but with varying location yields. The program finds that the yield probabilities vary from 0.7 to 1.0, and that yield expected under “ideal” conditions can be about 14% higher (this yield is computed as a sum of location yield probabilities). This simplified example shows that the process (wafer) yield probabilities are not as directly related to the wafer yields as one could think. Similarly, an example from Fig. 1 shows that the same is true for the location yield probabilities vs

YIELD

YIELD

YIELD

YIELD

I

I

1

I

I

I

1

1

1

I

I

FOR THE

1

I

FOR THE

1

I

I

I

I

1

1

MAP FOR THE

I

MAP

MAP

I

MAP FOR THE

I

I

WAFER

WAFER

WAFER

1

1

WAFER

1

3

2

Rt 4

#

X

#

MAP

MAP

MAP

MAP

Fig. 2. Analysis

YIELD

YIELD

YIELD

YIELD

I

I

I

1

FOR

I

I

I

I

I

I

I

THE

of wafers

1

t

I

1

1

THE

I

WAFER

#

#

7

6

5

MAP

MAP

3

6

6

Wafer NO.

9

9

8

9

3

6

6

OBSERVED

EXPECTED

PER

PER

0.90 0.020 0.14

0.70 1.00 1.00 1.00 l.cm

0.70

0.81

1.00

Wafer yield mob.

7.97 6.43 5.56 5.56 7.97 7.97 1.91 7.97 7.91 1.27 1.13

7 1 7 7 7 7 1 7 7.00 0.00 0.00

ObserVed wafer -held

Y: 10

11O:Y)

EXpectcd wafer vi&i

10 WAFERS,

lOWAFERS

Yield data by wafers

DIE YIELDS

DIE YIELDS

yield.

Mean spat. yield probability for yielding locations is 0.67.

4

3

OF THE

OF THE

with the same observed

WAFER#S

FOR THE WAFER X

I

I

I

FOR

I 1

I

1

FOR THE WAFER

0

d

c

EEYZ OZI’Z ZI6’EI 6ZE.8 086’81 WE’1 I LZt’9fI

&S”ap

.JXl

VSZ’O ZLO‘O

F

Zf6’Z SEP’Z fLL’81 EIP’II E99xE z89’1z PL6’68 I

xbsaul

258’E 6L8.Z

7

plaj

.qo,d

plax

sqm

6E.O PE’O vz.0 If.0 1S’O WO 00’1 00’1 9f’O WI 62’0

I E’O sM)‘o ES0

PM

dq wp

El1 101 ZL 06 ZSI SZI 08Z 9Of 011 853 58

68’81 I 6L’SOI SS’PL lZ’P6 96351 LE’PEI W80f W’SOE f8.111 voxof 10’06

I

Z88’E 665’E OISW L80’LZ oooxoc OOO’ZLI 0003f9

pOs

.J%x

696’0 L6p’O-

:ms

31

Plafi %% .pud Jo SoI ‘WN p[# %Yo ‘SqO JO 801 ‘IEN (%%) piati ~TPW WA pi# paulsqg PI+ ‘w m~pold pia* -q= pa-q0 (‘Isa) JaJcn .lad a!p JO Jq”“lN

ewp p=w!pau 4EP pa,-=qo

‘ON

II 01 6 8

.a’s

‘15’0 wds “eaM

., , S! SJaJemSu!plati

JO G%,Ut”N

xoi mq %OOI> pla!L amq s”oysol a!p A”tlm may smogs Jaqurn” S!‘u. ‘ZO$ISF SUO!W3Ol a!p _(S!+WIO”,, JO Jq”“lN

9EE’E 88L’Z L60’8Z IPZ.91 Z90.08 l6S.W 981’662

dOIS

KZ’OZLO’O-

JOJ 41yqeqoJd

PO’LII P5’869f I 00’ZLI

P”!plaj

OL’t6 9CL968 W8OE

s! suoyeml

zfseu~ Jo amuau! awxp”! Lem ~aqrun” S!‘I1 JO aSeaJ3”! !a!p lt”lJed MaJ I3 p”V Yyx apnlm! sasm amos “! lCeru I! !Jlasj! ~aqm” aq* “eq, a,ugmp”! afoul are mqrun” SW “! saiI”vq3 au ‘L9 s! suo~mol ,,%u!pla$ la&au,, JO mqlun~

Jall!q

6fZ’Z LEZ’Z 6LE.6 WE.6 888’L SLB’L IOIW

9EL’ I EIS’Z LU’S ZEZ’EI 098’2 L99’9 58E ‘OS

(%%) J”V

WLP 6L’L l

I

b

b;:,::::: Irr9bll49I9b “1~;~~‘;~ : Ii :;lllbb

:,::;:";I:: 74bllV9VIII

1

I 1

1'

t

1

t:

9

I b

;:;:

:: 1

t;,

dice for the second sample.

bb4bll

b4Ob4bl9 bb4444;:l b b 4b::lb 40110

4bb4b711119 4b4bbbbbgII ;;::;4;1:9

;;::;I;;:$;'

4b444b4IblbI 4IIb44441114

1 :b::,:I:,:: 4~~4~~;;;;~

19b4l LlbIlb 99b9bbb 41191141 IbI99111 11917911b 14IlIIlI 4 I I b

::::::'

I

,I::,:

IV

::::

::i 19

7 b I bg’::::17 : 4499 ::::’ I1 v 1 vi l;::!‘;:;;:’ :::: I g9II9rrI b 4 IV 224b blb9919lV 4 1 b I ;;;:l:f’;’ b b . , 4 9 4 4 ;;::;::9’

: ::

9 , a

IvIbt IIl99b rrl971~ ” v 1 4gvrIIb4 I b I rI9rl9II 9 1" 19999V9Il rl999rlI brrlOVI91

of acceptable

':::;:

9ib744;-

i

EXPEC

Fig. 4. Analysis

MAW OF ME

9

Observed data Predicted data

-0.992 -0.998

Regr. coeff.

3 189.974 57.318 63. I61 30. I72 33.250 3.407 3.504

2 299.186 123.409 140.836 41.248 47.073 3.720 3.852

-0.219 -0.280

Slope

4

Def. density 0.219 0.280

4.141 4.404

136.427 33.091 34.637 24.255 25.389 3.189 3.234

Intercept

Regression parameters

4.204

4.038

635.000 360.182 425.1 I7 56.722 66.948

1

Summary of the results on yield-area relationship

Number of yielding wafers is I I. Number of “nontrivial” die locations is 607. This number shows how many die locations have yield O%. Number of “never yielding” locations is 42. The changes in this number are more indicative than the number itself; it may in some cases include PCM and a few partial die; increase of this number may indicate increase of mask killer defects. Required number of iterations is 39.

Mean spat. yield probability for yielding locations is 0.70.

No. of die wafer (Est.) Observed abs. yield Predicted abs. yield Observed yield (%%) Predicted yield %% Nat. log of obs. %% yield Nat. log of pred. %% yield

IC size:

425.12 7215.61 84.94

360.18 8103.19 90.02

0.83 0.040 0.20

Mean Var. S.D.

425. I2 391.37 421.04 239.44 348.34 209.63 425.12 425.12 383.45 397.1 I 234.12

516 385 433 243 337 214 415 417 386 382 234

1.00

0.92 0.99 0.56 0.82 0.49 1.00 1.00 0.90 0.93 0.55

I

Expected wafer Geld

Observed wafer vield

Wafers yield vrob.

2 3 4 5 6 7 8 9 IO II

Wafer NO.

Yield data by wafers

6

62.89 81.76

Auf (%%)

84.101 13.409 12.443 15.944 14.796 2.169 2.694

1260

LEON L. F?STCHINSKY

log

log (yield)

(yIeldI

I

I

4.00 4;+&

82% AUF F

425 63% AU’F

3.+5

I 4.00

I 3.50 0

I 375

WP

0 P

3’25 I

I

3.00 Id% I 2.75

WP AUF

3.25

0 b

0

I 3.00

W

2.75

2.50 0

I

*iz5

W PC

I 2.50

I

2.25 P 1 -___ ,____2_-__3____4____5____-7____7_-_*___g*r*a

I 2.03

Fig. 5. Spatial data for the acceptable dice analysis. Yield/Area plots are derived by moving window method for the data from Fig. 3 (left) and Fig. 4 (right). V‘observed” point, P-point obtained from the Probability Model, W-point obtained from a single wafer ( # 4). location yields. For the convenience of comparing lots of different sizes, the stacking yield maps and location probabilities maps are given in yields per 10 wafers (rounded to the nearest integer). Thus they are referred to as, respectively, observed and expected yields per 10 wafers. To illustrate a process analysis application of the program, two samples from two lots of an experimental device produced within a few weeks of each other are compared below. In this particular case, wafer failure maps were automatically fed into the mainframe computer and analysed by the program. Up to 18 classifications of die failure modes were available, and some of them represented devices which were functional but did not meet the specifications. It was particularly interesting to study spatial patterns of such devices because this may indicate whether or not these failures were caused by process variation (e.g. the process producing too many devices with some parameters outside of the specifications), or by local processing errors (defects of the second group), etc. Since in many other model applications only binary (yield/no yield) data are available, it was interesting to determine if the program can provide some answers on the basis of such reduced information, and to verify the conclusions by a more detailed analysis of the failure modes (as long as they were available in this analysis). Study of simple yield/no yield maps is also advantageous because it eliminates

the need to interact with the program (which arises when new failure modes are to be included in the yield pattern), and saves a lot of analysis time. Such an analysis is likely to be sufficient in most cases, and it will also be clear upon reviewing the results if a more exhaustive study is warranted at all. Figure 3-5 correspond to the analysis of acceptable dice (that is, only bins corresponding to acceptable dice were coded by “1” as yielding). Both samples demonstrate substantial yield variation from wafer to wafer, as well as noticeable low yield areas, especially for the first sample (see observed yield maps and observed yields). It is clear that the second sample has better yielding wafers but it is hard to attribute the higher yield loss of the first sample to any of the causes (e.g. errors in processing, excessive process variation, gross/random spatial defects, etc.). Indeed, the classification of dice as yielding or non-yielding does not allow us to distinguish between a functional die with a parameter just outside the specifications and a nonfunctional die. Tables of the “yield data” indicate that the process was capable of delivering the same yield for both samples, and that the first sample contains too many marginal wafers. Thus the difference in average yields may perhaps be explained by the difference in the mean wafer yield probabilities (which accounts for about 635 x (0.83 - 0.53) _ 190 dice, or almost 70% of the average observed yield); the expected yield maps give mean location yield probabilities which

Jo

au

‘L

Jo

P

ZIS’ZlI

!a!p

Iegmd

waj

alm!pu!

JO

,,%mpIa!d

ease-pIa!d

E

PL6’681

Ib’%‘E% ZL6’9L LLCLSI LZS’WI

EPE’P

6Zp’P

Pz’88S

WLLP EE’SSS 06-205 8E’OES PZ’88S IZ’6LP VZ.885

PZ’%%S

98’z9s

VOTES

6ZS’P

s!‘LL

elep

.J%ax

pIa!*

00’1

18’0

00’1

58’0 06’0

WY0

18’0

96‘0 00’1

16’0

00’1 II 01 6 8 L 9 S P E Z I

ueayy

‘Je*

‘a-3

‘68’0 s! pIa!.A yzds ueaH

LO’0 500’0 E6’0

smoqs

mm>

moq

a!p ,.l++lUoU,,

w

:ms

pia!d PM =v ‘w

JO JaqLUnN

a!p Jo ‘ON

pau=qo

pala!~w

pBww~ pa-q0

‘sqo Jo %o1 ‘IEN

=I PIaH ~lqmnu

31

WdW W4) wti w!L

%%

Oh% .paJd Jo 801 ‘1eN

pau=qo

(3sa) JaJem ml

PlaG

PIaj

RlEP

Wep pal3!paJ,J

‘, , S! slaJBM %U!pIa!& JO laqlu”N

‘IOp s! sUO!lE3o[

x0<

a!p kmm

aql Jo hxurun~

aaeq suo!lexq

uo sl[nm

I

tKKrSE9

981’662 Z

9E9’Z6 9LL’88 LfZ.885 LZL’E9S

98t’t

%WL% 828’1% ZOZ’Z9Z 8I8WZ

sJaJem 4

ZLS sss z9s 885 LZS PSS 095 PLS Z6S LES 085

WOLE EL’E9S

PZ.885

92’61

6s’Z68

I

ELP’P

SOW

.#a03

1%6’0-

ow‘oadoIs

696’0-

PZWO-

%mpIa!d JOJ AlyqeqoJd

JO DqmnN

Os’ft

suoyxioI

Jarlay,

u! sl%ueqs

aql

yvx

wajap u_yunu

e pus

lem

mqmnu pa+nbaa

atom am ~aqmnu sgl

s! suoyxw~

EZS’P

L%t’P

ldaxalul

amos u! :JIasl! laqumu

awaxy

3ac.I

dysuo!leIa

I Lp’Z8 668’EL

s! suo!leJay

l! saw2

asea~u!

hxu

ueql aywxpu!

apnI3u!

syl

tfsem

9

6

JaIIy

LZP’9EI

1017%

‘EZ

818’001

86 I ‘6L SOS’69 909’99 SSt’8S

tls’sf ZIS’E9 LPO’BE OWZE S8E’OS

EOE’t

IPZ’P

ZIt’P

ZLE’P

@uap

ISI’C

J”V

PZE’P

(%%)

PZO’O

OPO’O

EI’Z6

18‘8%

1s~~

o,=*

I

‘SrJ31VM

,

,I

OL 13.d

I

I

Arrrtrr66 6AAA11166 6161AAAA6

1*9a’)c 8~8899c AAAl66S

310

[euoywn~

so-m*

xxp

trrrrr668r68l

‘s!sd[eutr

sAr66*A68rr8r96r668l 6868r6r86Arrl&llll~ 16818A886AA98A6886

6AAAA6A6A A4A686AAA 866816116 ~5586866AtArl6AAAA66888~5 ~~0666r1r6168A6I6166~8~1 cs~LBArr*6rrr6r68r66161~ 1666666AAA6AAA6A6ArA86~8‘ 6~~8r6r6rrrr6r6rrArrA8~ stlBrrrr8r666r*rrrA684~

199516 l688864 56AAAA6 6865AAlr6ArtA6A~89c 1116666886666rrrr91 166866Ar6r8rr*66*6‘8 *8666A6A866AI6AAAA669 96666ABr686rr6rrr666Z 968r6666A8rrrlrr666rA6~ 86lrr666lrtrr68rr6rrA6~, r48rrrlAAr66A*rrrrr.*88,‘ ~16r6rrrr666*6A66Alrrrrl( r,r...~~rrrtrrr6Arr6A~~8S lBlrllrtrrrrCIIII6CI6686~

.alduras

(13AMS80

z5 I 3HI

‘9 ‘%!d

,o

15 1s

dIYY

9

6

P

dysuqqa~

6LE’t SEZ’P PZL’6L SCZ.69 V9L.801 SSP’v6 LZt’9E I

IWO SW0

tan-pla!l

E

LIP’9 EOEY 128’78 016’EL SEE’LSI 6OVOV I VL6’681

1zY.w ZB’LZS Z0’8K E I ‘8fS OO’LB IZ’909 IZ’909 IZ’909 EI’PPS SL‘E6S 18’ZIS

OI’ZP SZ’ZU I IZ’909

suo!pxq

-ho<

w-3 %oo~ > wa!d

Z!S31

Pla!A %% wud JO 801 ‘IEN pla!K %% -sqo Jo sol '18N (%%) PIa!& F’lTQJd (%%) ma!A pati’& ppj -sqepw!pa~d rv!L v w.wo (~3) lapm lad a!p JO mqumN

00’1 LS’O 06’0 68’0 28’0 00’1 00’1 00’1 06’0 86’0 S8’0

L6S 895 ZLS OLS ISS OE9 985 165 OLS 585 ISS

Aq eiep pIa!*

Lo’0 SW0 16‘0

I

II 01 6 8 L 9 S P E Z

‘16’0 S! pla!d ‘ieds uaam

a*eq suo!im3~ a!p &mu moq smoqs mqrunu S!‘u ‘fP S! s”oy%“~ a!p ,‘,e!A~l"o",,JO M‘,"‘"N .I [ s! slaJem %u!pla!6 ~0 mqumN

WIZ 69’6SP 81’6LS

slaps

1

6SSP ‘iIS’* SWS6 012’16 LOZ’909 181’6LS OOOxE9

E86’0986’0-

uo sl[nsaJ aql JO ~eruurn~

2

981.66Z

16o’stz

98p’P 9Oe.P E6L.88 616’18 LS9’59Z

%u!pla!L IOJ kl!l!qeqo,d

‘gf s! suo!wai! JO ~aqumu paI!nbaa 3vaJap .,lalpl,. qseru Jo aseamu! awxpu! Kern ~aqlunu s!ql JO aseamu! :a!p pnged ma, e pue yvx apnly hut 1~sasm atuos u! :mqmnu aqi ueq, aA!gmpu! a,oru an mqumu s!ql u! saSueq3 aql ‘0 s! suoy30~ ..lu!pla!6 Ja,tau,, JO mqumN

ZOE’P SZI’P PII’EL P88.19 8Lo’Z9 smO’zs IOI’P8

81Z‘P 186x 8WL9 88SES SIZW WLZ S8E’OS

51.96 9L.26

1:::

A

6

6

ctre * 6

D

A

; : : : A

:

6 I

:

1 ‘ ~06 1 9 I 6 9 0 i 9

: A

::t:

:6::

1

I6

:

1263

A model for IC yield analysis and process control log (yield)

log (yield)

I

P

4!55 p

930;. AUF

450

E&h AUF

AUF

‘di

4.55

9;r% AyF

I

O

4.50

P

0 P

4’45 P

I

4.45 P

I 4.40

P

O

I 440

P

I

P

I

435 I

0

0

4. 5

4.30 I 4.x) I 4.25

0

0

P

I 4.25

I 4.20

I 4.20

4.1 5

0 I 4.15

I 410 ____,____2____3___4

0 ____ 5-u-6

____ 7---_8---_9&~

I 4.10 --__,__-_2--__3--__4____5____6____

,----

*_---9~~

Fig. 8. Spatial data for the functional dice analysis. Yield/Area plots are derived by moving window method for the data from Fig. 6 (left) and Fig. 7 (right). O-observed point, P-point obtained from the Probability Model.

differ by less than 0.19, corresponding to N 120 dice or 42% of the average observed yield. This speculation is supported by the “regression parameters” table since the estimates of the defect densities for the “predicted data” are virtually the same (0.254 and 0.280), indicating that random yield loss pattern may be responsible for loss of one die out of four in both samples, and that the difference in sample yields is rather process related and shows in a much lower Yaurvalue for the first sample. Conclusions derived on the basis of “observed data” might have been different, especially since the yield-area relationship for the first sample is nonlinear so that estimates of YayIand defect density (derived from a linear model) have no validity (see Fig. 5). Notice also a curved pattern of moving window data points from a more or less typical wafer, No. 4 from the first sample. Results of additional analysis of all functional dice are presented in Fig. 6-8. (As it turned out failures of only two parameters were responsible for most of the yield loss.) Spread of the wafer “yields” is now relatively small, and the values of the wafer yield probabilities are virtually identical (with the same mean 0.93, standard deviation 0.07, and ranges of 0.18 and 0.19). This suggests that only about 44 dice are lost due to process spread caused by the defects of the first group (if functional dice were treated as acceptable). There is no systematic “spatial-type” yield loss for the second sample; the number of non-yielding locations is 0. The systematic yield loss

takes an obvious toll on the upper central part of the wafers of the first lot (w 14 locations of no or marginal yield), this is just one of non-yielding areas of the yield maps from Fig. 6; the others (as it turns out) have not been caused by systematic errors (such as gross mask defects, etc). Observed and expected yield maps are very similar, although the expected location yields show noticeably less variation than the observed yields (since the “process related” loss has been excluded). The estimates of the Gross Defect Area model parameters are similar in both cases, and of the same nature as the results of the moving window analysis of the “observed data” (the latter produces slightly larger defect densities and lower estimates of area usage factor to absorb some of the systematic and process-related yield loss). The loglinear fit is adequate for both “observed” and “predicted” data plots, which is always the case when the wafers are high-yielding, so that the observed and predicted data points do not differ that much. Summarizing the results of both analyses we may conclude that: 1. The process is not capable of delivering consistent high yield due to excessive variability of some of the device parameters (which can be seen in relatively random spatial distribution of nonacceptable but functional dice); let us point out that numerical analysis of the failing parameters would not have reached the same conclusion without additional spatial analysis.

LEON L. F?WTCHINSKY

1264

2. The loss caused by random spactial defects is quite insignificant (compare the defect densities for both analyses). 3. The estimates of the wafer yield probabilities are virtually not affected by systematic yield loss (see Fig. 6 and 7). 4. The most interesting fact is that even prior to the analysis of all functional dice we were able to establish that mainly defects of the first group were responsible for the yield loss, and that only these defects (and not some spatial phenomenon as it could have been suggested by wafer stacking maps) explained such a dramatic difference in yield between both samples; a similar conclusion may have been achieved by a careful study of die failure patterns on individual wafers, but even if carried out it would not have provided quantitative results (being also prohibitively time consuming). 5. The data plots in Figs 5 and 8 merit additional discussion. Notice that the moving window yield data points for the predicted by Probabilistic Model data and for the observed data do not (and should not) agree. This happens because the predicted points correspond to the rectified yield data obtained from the map of spatial yield probabilities. Thus a waferto-wafer variability is excluded and this mainly explains the linearized yield-area pattern of predicted data when the observed data are not linear (see Fig. 5); it also predicts yields of a more stable process which is still within the reach of production capability. This last remark is very much in line with the general procesqcontrol goal which is to stabilize the process (“bring it under control”) and to find out its realistic capability. In order to achieve this goal one has to exclude wafer-to-wafer variability, and on some occasions it may lead one to disregard some “low likelihood” events (e.g. surprisingly high yields of large chip multiples on some wafers, or some high-yielding wafers altogether). The Gross Defect Area model estimates obtained for the data points predicted in such a way give one an idea of what is the “gross” process capability (in a form of area usage factor), and what is the loss due to random spatial factors (be it random defects, failure of some parameters to meet specifications demonstrated in a random spatial manner, etc.). Let us reiterate that the Gross Defect Area model is just one of many models which can be used in conjunction with “rectified” by our model spatial yield data; it happens to be a simple one, was shown to compare quite favorably with the other models, and has a straightforward probabilistic interpretation. It also fits these data remarkably well in virtually all the cases analyzed by the Probabilistic Model; however, it should not be used as an argument in favor of excluding the competing models from the consideration, even though none of them proved thus far to be any better.

nomical way of summarizing yield failure map information, enabling one to analyze spatial yield patterns and to detect changes in wafer processing conditions. In conjunction with the moving window method, the Probabilistic Model delivers reliable estimates of the parameters of the Gross Defect Area model, eliminating such distracting factors as wafer-to-wafer yield variability and violation of model assumptions on individual wafers. The model can provide sufficiently accurate results on a basis of relatively small and not necessarily homogeneous wafer samples. The Probabilistic Model analysis can be easily automated and attached to the wafer testing programs. The model can also be applied in the analysis of bit maps for ROM and DRAM chips. Acknowledgements-The author wishes to thank many of his colleagues at Signetics, Dr W. W. Rey from Philips Research Laboratories, and the reviewers for valuable comments and suggestions. REFERENCES

1. J. L. Devore, Probability and Statistics for Engineering and the Sciences, Brooks/Cole Publisher, Monterey (1982). 2. M. R. Gulett, Semiconductor Int. 87 (1981). 3. A. Gupta, W. A. Porter and J. W. Lathrop, IEEE J. Solid-St. Circ. SC-q, 96 (1974). 4. W. E. Ham, RCA Rev. 39, 231 (1978). 5. R. S. Hemmert, Solid-St. Electron. 24, 511 (1981). 6. S. M. Hu, Solid-St. Electron. 22, 205 (1979). 7. A. C. Ipri, RCA Rev. 41, 537 (1980). 8. A. C. Ipri, Solid-St. Technol. 85 (1979). 9. G. E. Moore, Electronics 43, 126 (1970). 10. B. T. Murphy, Proc. IEEE 52, 1537 (1964). 11. B. T. Murphy, Proc. IEEE 59, 1128 (1971). 12. H. Murrman and D. Kramzer, Siemens Forsch.-u. Entwickl. Ber. 9, 38 1980. 13. 0. Paz and T. R. Lawson, IEEE J. Sol. Sta. Cir. W-12, 540 (1977). 14. J. E. Price, Proc. IEEE 58, 1290 (1970). 15. J. Sredni, Int. Electronic Device Meeting Tech. Digest, pp. 537-539 (1975). 16. R. Stanley, Signetics internal correspondence (1981). 17. R. Stanley andE. Lee, Presented at Meetings of Electrochemical Sot.. San Francisco, (1983); and American Statistical Ass:, Toronto (1983): ‘. 18. C. H. Stapper, IEEE Trans. Electron Dev. ED-20, 655 (1973). 19. C. H. Stapper, IEEE J. Solid-St. Circ. X-10, 537 (1975). 20. C. H. Stapper, IBM J. Res. Develop. 20, 228 (1976). 21. C. H. Stapper and R. J. Rosner, Solid-St. Electron. 25, 487 (1982). 22. R. M. Warner, IEEE J. Solid-St. Circ. X-9,86 (1974). 23. T. Yanagawa, IEEE Trans. Electron Dev. ED-19, 190 (1972).

VI. APPENDIX Likelihood function (1) can be maximized directly by solving the system of equations al0gL

-=o,

i=l,...,M

ap, V. CONCLUSIONS

The Probabilistic

Model provides a fast and eco-

a log L

-=O, arr,

j=l,...,N

A model for IC yield analysis and process control under the boundary conditions 0 sp,, qj 5 I. (These conditions somewhat complicate the numerical aspect of the problem.) Despite the huge number of parameters in (5) (up to a thousand or even more for some small devices), the likelihood equations allow for a simple solution algorithm. First we “weed” out all “trivial” locations and wafers for which the yield is always 0 or 100% (since it is easy to see that to maximize L the corresponding parameter values should be respectively 0 or 1; e.g. in the example from Fig. 1 we assign qj = 0 to the locations 1,3,4,12, and 13, so that these q, values are excluded from the further computations). After this has been done we use some algebra on the likelihood equations, and rewrite them as N 1-e. c L=N 11 I 1 --P,qj

i=l,...,M,

M 1-e c “=M ,=,l -piq,

i=l,...,

N,

O~pi,

q,sl.

1265

the normalization of the pis and q.s is not unique. For instance in the program maximum o c p,s is always 1 (which is determined by setting the initial values for the pis first), but it can be reversed if one feels that maximum of 4,s. rather than that of p,s, should equal 1. To do it we can simply set the initial values to qjs instead of p/s. This ambiguity does not create any practical problem since, first, the results of the program are used primarily for the comparisons, and, second, it is highly unlikely that in applications maximum of Pgs can be less than 1. Theoretically this could happen in a large batch of low yielding wafers, and in problems analyzed thus far maximum values of p,s and qjs were always equal to 1 (so that maximum of P,j was automatically the same). Two alternative estimators may be used to obtain values of the parameters in Probabilistic Model. The first one is the Pitman Estimator given by (5’):

(5)

(summation in (5) is taken over all “nontrivial” wafers and locations). Now we observe that the left-hand sides of eqns (5) are monotone either in P, (the first group), or in qj (the second group); in addition, with pis given each qj can be easily found from just one of the equations of the second group, and vice versa, pis can be found given q,s from the equations of the first group. (For the Fig. 1 example we can see that, say, one of the second group equation is 1

1--P,q,+

___

1

=4,

1 -P447

so that once p, and p4 are known, q, can be easily found from this equation). These remarks enable us to propose a simple algorithm for solving (5). We set, say, the initial values of pis, and then determine q/s. (The program uses the standard Newton’s method to do tt.) This in turn allows to determine the next set of pis, and so on. The iterative steps are repeated until the desired accuracy is achieved. The computation time also depends on the variability between the wafers, on the initial set of the pi values, and obviously on the numbers of dice and wafers. The program uses the relative wafer yields as the initial values ofp,s, and accuracy of four decimals. The computation time for lots of up to 50 wafers with 500-700 dice per wafer does not exceed 2-4 min and commonly requires less than 50 iterations. One can notice that the likelihood function (1) depends on the values P, =piq,, rather than on p, and qj,separately. This shows that the solution of (5) is invariant in the products p, x q, so that when maximum of P,, is less than 1,

(5’)

The second is a generalization of the Least Squares estimator: Ce, qj=L;

CPI

P,

pz5L,

Ce

(5”)

I

Each of three systems (5), (5’), and (5”) uses the same iterative procedure described earlier (Maximum Likelihood), but they differ in the way in which parameters are estimated on iteration steps. Solution of system (5”) is the easiest, provides for unbiased estimates, but leaves open the question of normalization; (5’) is hard to program, but the estimates are automatically normalized and boundary conditions (0