SOME CONSIDERATIONS IN THE FORMULATION OF IC YIELD STATISTICS S. M. Hu IBM, System Products Division, East Fishkill Facility, Hopewell Junction, NY 12533,U.S.A.
(Received18 May 1978;in
revised form
5
August1978)
Abstract-Some published modifications of the Poisson distribution for describing IC yield are critiqued. It is shown that it is incorrect to obtain an average yield for a non-uniform defect population by integrating, either in the geometrical space or in the density space, the Poisson distribution with some assumed density distribution functions. The correct way, and happily also the simplest way, is to average the yields of regionally partitioned subpopulations in a discrete manner. The simple Poisson distribution would become rigorously correct when the size of an imaginary IC increases to one quarter of a wafer, regardless of the non-uniformity in defect density. It is also shown that both cases of clustering of defects, one due to interaction among defects themselves, and the other due to wafer regional preference, result in increased yield for a given defect density in a wafer. On the other hand when there are interactions between defects and IC active area elements, or when defects themselves have physical dimensions, there would be a decreased yield for a given defect density, and a non-zero intercept in the plot of the logarithm of yield vs the active device area.
1. lNTRODUCTlON
The projection of yield has always been a factor of consideration in the advancement of integrated circuit (IC) technology. In 1964, Murphy first recognized]11 that the Poisson distribution is not a good description of IC yield vs the total active area (A) of an IC, and tends to give an unduly pessimistic yield projection for future IC with increasingly larger A. The Poisson distribution expresses the IC yield, Y, as: y = e-“A,
(1)
where A is the mean deject density, over an area much larger than A. (The reason for the underscored condition will become apparent later.) It is known [ l-l l] that the log Y-vs-A plots of actual data are usually concave upward, rather than linear as predicted by eqn (1). This, together with the observation that different areas in a silicon wafer often exhibit glaringly different defect densities, has led many investigators to propose various forms of modifications (or rather, extensions) of the Poisson distribution. The most celebrated, and by far the most espoused[l, 3, 6, 8, 10, 111,is one which assumes that the deject density has a long-range non-uniformity expressible by a normalized distribution function (or, in the terminology of probability theory, a probability density junction) j(A), and that the resulting yield is axiomatically given by: Y =
I
_
e-^*jQ) dA
0
or by integration in the geometrical space: Y = l/S
Is
e-M(r) dS(r),
(3)
where r is the position vector and S is the area of the wafer. Equation (2) is implied to be equivalent to eqn (3) through the transformation of A(r) to j(A). The purpose of this paper is, firstly to show that the integral forms of averaging yield, eqns (2) and (3) are incorrect and may lead to an overly optimistic yield projection of larger IC, at least on a theoretical basis. Secondly, a correct formulation is suggested that the overall yield should be obtained by summing area-weighted yields in appropriately partitioned regions in a discrete manner. It will be shown that the yield in each such region is determined by the mean defect density over the region, regardless of how non-uniform the distribution A(r) in it may be. Some salient features of this yield formulation will be observed. Also, some specific forms of yield expressions from the literature will be commented upon. Thirdly, we will make some simplified analyses to include considerations such as when defects tend to form clusters because of their intrinsic properties rather than regional preference; when defects are not merely geometrical points, but are physical entities with dimensions; and when there is interaction between defects and the active elements of the IC. 2. NON-DNIFORM DEFECTDtSTRDltJTlON ANDYIELDSTATISTICS
To avoid axiomatic traps such as leading to eqn (2) and eqn (3), let us start from the most basic probabilistic considerations. A typical modem IC may consist of - IO5 geometrically disconnected active area elements ai, dispersed in the chip area Ac. The IC is good if and only if every one of these ai is defect free. For the time being, assume that all these ai can be equivalently replaced by a simply connected area A. (We shall see later that this is. not always permissible, such as in the case in which defects form clusters, or in the case in which defects 205
206
S.
M. Hu
have physical dimensions, or various other conditions.) A = Z ei C Ac. Suppose there are n independent point defects, distributed randomly over an area S (A C S), which may be a part of or the whole of a silicon wafer. The probability that a given defect is not in area A is (I - A/S). The probability that every one of these n defects are not in A is then: P = (I- A/S)“.
(4)
If this probability holds for all other IC in the same wafer, or in different wafers of a process batch of interest, the yield is then simply Y = P. Noting that n = SA, one has: Y = (I - A/S)S’A+A,
(5)
=e --** for very large S (A/S+O).
(6)
Equation (6) is, of course, the Poisson distribution eqn (I) of zero defect. The product AA is the statistical expectation (the expected number of defects in area A). Now consider the case in which the defect density A is non-uniform over the wafer as well as over a chip area Ac, which in some futuristic IC may be a sizable fraction of a wafer (or at least a sizable fraction in radial dimension. However, our analysis here is not subject to the assumption of an axial symmetry in defect distribution). The task is first to prove that, regardless of how sharply A may vary over Ao, the yield of this (ith) IC is determined by the mean defect density h over AG or a somewhat larger area Si (S C Ao), simply as:
where I hi =s
I
Fig. I. Partitioningan IC into area elements ASi (typically lo’-Id of these); each of such area elements contains an active area ai.
The probability that the IC is good is then: Yi = n e-“i9 = e-‘i*i“i = e-B I: AiASj i
Since ASi is of the order of 10e5Si,the bracketed term in eqn (12) is effectively identical to the mean defect density i given by eqn (8). Together with the substitution of OS, by A from eqn (lo), eqn (12) becomes eqn (7), which was to be proved. It follows that eqn (3) is incorrect. Since the probability event of each IC is an independent and isolated event, then according to the probability theory, one has:
siA(r)dSi(r),
and not by eqn (3) integrated over Si. Note that a typical currect IC contains some lo5 geometrically disconnected active area elements ai, more or less uniformly dispersed in Ae. We can therefore consider the probability event (defective or nondefective) of the IC as a joint probability of the simultaneous and independent probability events of all these ai ; i.e. a good IC is one in which all the ai’s are simultaneously defect free. Consider that a local defect density Ai can be defined in AS, [abbreviated for (ASi)i ; there should be no confusion], where ASi > aj. On the assumption of uniform distribution of ai in S, it is possible to partition the area Si (or AC) in such a way (see Fig. 1) that, for all ai’s, there is a common factor 8 (0 < 1) representing the areal fraction, such that:
Y = 7 (S/S) Yi = 7 (S/S) edriA
(13)
where S = B Si. Each partitioned region Si appearing in i eqn (13) need not be restricted to contain one IC. It can be an equivalent area consisting of an integral number of IC in disconnected regions so long as the mean defect densities in the respective areas Ao are for practical purposes nearly equal to one another, regardless of their respective distribution A(r). From practical experience, defects are most often confined to certain regions, such as the (100) zones at the rim, as well as the center of the silicon wafer, as illustrated in Fig. 2, so that at most 3 or 4 properly partitioned regions will suffice. When yield data are available, separating yield data according to regions is merely a bookkeeping procedure involving no extra experimental work. The yield of an IC in any aJ = 0ASj, (9 defined region of a wafer is obtained from the statistics and hence: of that region of all wafers in a batch as well as in all (10) batches within a defined time period. It should be emphasized that eqn (13) cannot pass to The probability that the area element ai is defect free an integral form by shrinking Si to dS$ (or SSi) and by independently of the other elements is: replacing hi with a local defect density. This is because for a probability event to be meaningful, the defect p, = e-*Pi. (11) density must be defined in an area S, larger than and
Some considerations in the formulation of IC yield statistics
Fig.2. An example of the partitioning a {lOO}wafer into regions of average defect density Ai,for use with eqn (13).An octant is shown. The defect distribution in this case is assumed to have mirror symmetry about the (100)and (110)axes.
containing A, so that there is a probability of the defects
being outside A and yet still in Si. When Si shrinks to A, all defects that are defined in Si must also be in A, so that the probability of defects being outside A is zero. This is most readily seen from the discrete expression eqn (5) which goes to zero as S equals A. As S, or Si, becomes smaller than A, eqn (5) is meaningless (i.e. raising a negative quantity to a non-integer power). The physical meaning is simply that when a defect density has been specified in a small area within A, it is meaningless to speak of the probability of no defect in A. The equivalent fallacy of eqn (2) is on the other hand better camouflaged. The transformation of A(r) to the probability density function f(A), not discussed in the literature by its proponents, is understood to be given by: [I - &(p(r))l dSW,
(14)
where p(r) is the same distribution A(r) now regarded as a variable used to evaluate the step function S, specified at A (there need not be any confusion between S*(p) and the symbol S used to denote a specified area). In practice the integral in eqn (14) needs to be evaluated for sufficient number of discrete values of A so that a density distribution F(A) can be obtained, in tabulated form or in graphical form. The transformation amounts to collecting area elements AS across all It’s in the wafer. An important feature of the yield expression (13), as compared to the Poisson distribution, eqn (2), is that in the In Y-vs-A plot, the yield given by eqn (13) has a changing slope, decreasing with A. This can be shown as follows. For very small A, one can approximate each exponential function in eqn (13) by the first two terms of its standard series expansion. Equation (13) then becomes y
z I- A z (SJS)L= 1- Ai I
s
eeru,
as
0.01
207
\ \
A
Fig. 3. General features of the log Y-w-A plot as expected from yield expression eqn (13).
in the In Y-vs-A plot, given by -1. At very large value of A, it can be observed that all terms in eqn (13) will vanish relative to the remaining term involving the smallest A, (SJS) e-r@ . The slope of the In Y-vs-A plot will approach i+, asymptotically, as shown in Fig. 3. One may observe that the simple Poisson distribution, eqn (1) would become rigorously correct as the size of an imaginary IC chip increases to one quarter of a wafer, regardless of non-uniformity in defect density. (A fourfold symmetry of defect distribution is assumed.) This simply follows from eqn (13) when the summation reduces to one term, and hence to eqn (1). 3. ERROR INCURREDFROM USING YIELD -NS (2) AND(3)
Theoretical basis aside, one may ask how much an error would be incurred from using the yield expressions which we considered incorrect, eqns (2) or (3). We attempt to give an estimation of the error in this section. In order to give a meaningful estimation of the error, let us not limit our analysis to some specific distribution functions f(A) that have been used in the literature, lest a possibility exists that the analysis happens to be biased vs the particular distribution function not of best choice. Now, when the specific form of f(A) has not been given, one will have to settle for some parameters of general statistical significance, namely, the mean density h; the variance c?, the skewness and the kurtosis. (The last two parameters are physically meaningful only in unimodal distributions; for a multimodal f(A), these are simply the third and the fourth moments about the mean.) When the quantity u is small, only the parameters A and uZ have significant influence on the result from any particular yield expression. Under this condition, the error incurred from the use of yield expression eqns (2) or (3), can be estimated (see Appendix 1) to be: AY = Y* - Y = (A72) 7 (Si/S)u?,
A _, 0.
m
(16)
(15) The mean defect density A in eqn (1) has exactly the same meaning as i in eqn (IS), since they are both defined as the mean density over the entire wafer. Therefore, both eqn (1) and eqn (15) have the same slope
where Y* is the yield given by eqns (3) or (2), Y is the yield from eqn (13), and a? is the variance of density distribution of the defect population in region i. The error is thus seen to increase proportionately with the second power of A and with the average of regional
S.M. Hu
208 variance.
From the inequality (also see the Appendix 1)
T
that: (17)
hluli~ple delta -unctions corresponding to Eq. 13
it can be readily inferred that the error AY will increase with the size of the regions of partition, or with the decrease of N, the number of such regions, and reaches a maximum when N reduces to one. Thus, an increase in A will actually increase the error more rapidly than A* because of the indirect increase in the average of variances through the increase of the chip size AC as in some way dictated by the increase of A. Frequently, however, the defect population has a multimodal distribution-possibly resulting from a mix of a number of subpopulations. Then, it can be shown (see Appendix 2) that in each region, say i, uz may not diminish with the size of the partitioned region as in the unimodal case, but will have a lower bound of
; S z7=
\
:‘\ $! \
Q % k
Probablllty function,
\ \
\\ \\.
‘4
density from Eq. 14
-’-’
_,”
’
’ ,,*-..
,’ __--_-’ ‘3
‘2
-*. ---_ __ --_
*.\ \Y
‘1
Defect density
Fig. 4. Schematic representation of a multimodal probability density function f(A ). The multiple delta functions, suitable for eqn (13), do not in general correspond to the modesin the continuous probability density function.
for the case of a bimodal defect population. In eqn (18). h, and & are, respectively, the mean densities of subpopulations a and /3 in region i, and 0, and 0, are their respective fractions of population mix. For a multimodal distribution, expression (18) can be readily generalized to give
While the determination of f(A) from the experimental data is very difficult, there have been a number of investigators who suggested special functions to represent f(A). All such f(A) functions suggested in the literature, so far, have been chosen for reason of simplicity in the integration of eqn (2), rather than for their resemblance to the actual density distribution-perhaps no single distribution function is universally pertinent for representation of actual defect populations of diverse cases. The exponential probability density function:
where m and n denote the running indices for the subpopulations a, p, y, etc. By definition of multimodal distribution, h;, # h;, if m # n.
1 f(A) = -a exp (-A/a)
(18)
4. CO-
ON SOMESPECIFIC YIELD EXPREWONS
Besides having a faulty theoretical basis, as discussed in Section 2, eqn (2) is also difficult to evaluate in practice because of the difficulty to obtain the probability density function f(h) from actual experimental data. If one still insists, despite this, on averaging by integration in the A-space, the correct transformation from eqn (13) (and not from eqn 3) should be obtained with an f(h) consisting of a few terms of #S-weighted delta functions, as illustrated in Fig. 4. Transformation of actual defect density data in the geometrical space into f(h) according to eqn (14) would be something like the dashed curve in Fig. 4, which is known as a multimodal distribution. Physically, this implies the existence of defects of different nature, different origins, or different regional preference. The multiple delta function distribution comes basically from the mean defect densities over regions where the probabilistic events of hundreds of thousands of active elements must be treated as simultaneous events. Each discrete defect density, hi, in the multiple delta function distribution is not generally the mean of a particular mode in the multimodal distribution; the two different distributions in general collect defect densities across different regions.
(20)
has the interesting feature that u = h= a. This distribution function has been used by Seed[3] in conjunction with eqn (2) to arrive at: Y = l/(1 t hA).
(21)
Equation (21) has a form recognizable as a BoseEinstein distribution (whereas the Poisson distribution, eqn (l), is recognizable as the Maxwell-Boltzmann distribution). From this recognition, Price[5] arrived at the same expression as eqn (21) in a dierent way, by basic probabilistic arguments which consider all the defects concerned to be indistinguishable. While the consideration of the defects as indistinguishable is certainly reasonable, the line of probabilistic arguments in Price’s derivation has a hidden mistake. In probability theory, a set of identical (or similar) defects may be artificially considered either distinguishable or indistinguishable without affecting the outcome of the distribution in the arrangement of defect occupancy, so long as the probability of each occupancy arrangement is properly weighted by a multiplicity factor[l2], when the occupants are considered as indistinguishable. In the Bose-Einstein distribution, however, an equal probability (neglecting multiplicity) is forced on all distinct arrangements, an axiomatic assumption
Some considerationsin
the formulation
contrived to fit the physical observation of certain particles, e.g. photons, atoms containing an even number of particles, etc. (Similarly, Fermi-Dirac statistics has a different axiomatic probability rule of distribution, and is suitable for only certain other particles.) It should be obvious that the defects of our present concern are not governed by any of such specialized quantum mechanical axioms. Assuming a gamma function distribution for f(h), Stapper[ 10, 121derived the yield expression: Y = [ 1+ &(a2/k)]
-h*‘u2- (1 +gA)-”
(22)
where he defined h as the mean A, of the distribution f(A), and u* as its variance. The validity of this expression, of course, is subject to our comments earlier on integration with respect to A. Equation (22) has two parameters, g and h, for fitting experimental data, and can hence be regarded as an extension of either eqn (1) or eqn (21). It reduces to eqn (1) for a*/,?+O, and to eqn (21) for u’/h-+ 1 (recall that in an exponential distribution function, such as used to derive eqn (21), (r = h). While both eqns (21) and (22) do exhibit upward concavity in the In Y-WA plot, they remain without theoretical basis, and without any relationship to the actual defect distribution. If they can be accepted as an empirical fitting to the yield data, why not the more innocent form of parabolic or cubic regression? A sympathetic answer is that all those documented modifications represent various extents of smearing of Poisson’s distribution and thus still retain its essential characteristics; for small A, all such modifications reduce to Poisson’s distribution. 5. YIELDE-IONS DEFECT
CONSIDERBIG DEFECT CLUSIXRING, INTERACTIONS AND DERCT DMENSION
We first consider the case in which defects tend to cluster, not as a result of regional preference, but because of the intrinsic properties of the defects. It is convenient to associate with each cluster of Y point defects an area a. For simplicity, assume the active areas ai in an IC to be of the same size, A = Z aj = mai (m, the number of ai in A, is of the order of ;05). Because the defects now have dimension it is no longer possible to substitute all such ai by a simply connected A. If the active areas ai are separated by distances greater than the size of the cluster, the problem can be treated by defining an effective active area of each ai, approximately given by (g/a t gv/a)’ (see Fig. 5). Here g is a geometrical factor, and has a value of ~/V/IT for circular clusters. For the IC to be good, the center of each defect cluster must not fall inside any one of these m effective areas. Following a similar argument leading to eqn (6), it can be shown that: Y = exp {-A/v[A + m(a! + 2g~(aA/m))J}.
of IC yield statistics
209
0. Fig. 5. Diagram illustrating area element (enclosed by
the definition
of an e&ctiue
It is easy to infer that the case of random defects having physical dimensions is given by a similar expression: Y = exp {-A [A t m(a + 2gv(aA/m))]},
(24)
where a is now the size of a single defect. In contrast to the case of clustering, the physical size of the defect decreases the yield for a given A, as can be seen from eqn (24). Two things can be observed common to the yield expressions eqns (23) and (24). The first is that as the total active area A reduces to zero, the yield is now not unity. The In Y-vs-A plot has an intercept of -Ama at A = 0. Experimental data often show such an intercept. It should be noted, however, that the model that includes patches of completely faulted areas[ll], can also explain this feature. The second to be observed is that the log Y-vs-A plots of both expressions are concave upward, again consistent with experimental data. This feature is, however, additional to the concavity caused by distributed defect subpopulations, or long-range inhomogeneity, as expressed by eqn (13). Some defects tend to interact with device structures, such as certain kinds of emitter edges [ 131.In such cases, it is convenient to introduce a .parameter of “emitter perimeter”[l31. On the other hand, such emitter edges may exert a pulling force on the defects (on dislocations due to shear stress fields, and on point defects due to compressive or tensile stress fields). Depending on such factors as the thickness of the silicon nitride films, the edge-induced stress field may have an effective range on dislocation in silicon of between 10 and 20 pm[14]. This range is quite sizable compared to ai in current devices. The effect of this edge stress can be taken care of in the yield expression by considering an effective device area for each of the separated area elements as (d/a +2x)*, where x is the range of interaction. This leads to:
(23) Y = exp {-A[A + m(4x2 t 2d(A/m)]},
Clearly, one sees that the yield is increased by clustering for a given defect density A, for cases where the size of the clusters is sufficiently compact. SSE Vol. 21. No. Z-F
active
dashed line) and the actual active area element (enclosed by solid line). The center of every area defect (or point defect with an equivalent range of interaction d/a) must be outside the dashed enclosure for the element to be defectfree.
(25)
which is similar to eqn (24). In this manner, the effect of the “emitter perimeter” comes in through the term
S. M. Hu
210
2mxv(A/m) in the above expression (add a geometrical factor if a is not square), and should be empirically identical with the expression used in Ref. [13], except an additional term in eqn (25) representing an intercept of -4x2& at A = 0. Dislocation segments, or needle defects and other line defects parallel to wafer surface can be treated similarly by recognizing that there are only a limited number of permissible orientations of the line defects-usually in the (110) directions. Thus, by dividing such line defects into a few groups (two in (100) silicon substrates; three in 1111) substrates), one can readily account for these defects by considering effective device area elements of v\/o(du t I), where 1 is the length of the..-defect. When it is necessary to consider all possible kinds of defects, points, line, area and clusters, with and without interactions assume that the probability that the IC is not infested by the population of the ith kind defects is Yi. The probability that the IC is simultaneously not infested by every kind of all these defects is then given by:
have sizable physical dimensions, or cluster in patches of sizable areas, the yield will be smaller than unity as the total active device area A reduces to zero, an aspect which is also consistent with many experimental observations. RkFERENCEs 1. B. T. Murphy, Proc. IEEE 52, 1537(1964). 2. G. R. Madland and J. Jolly, Electron. Ind. 40-44 (April 1966). 3. R. B. Seeds, IEEE Int. Cow. Rec.. Part 6,60-61 (1967). 4. G. E. Moore,Electronics 43, 126(Feb. 16, 1970).
5. J. E. Price, Proc. IEEE (L&f) 58, 1290(1970). 6. T. Yanagawa. IEEE Trans. Electron Devices ED19, 190 (1972).
7. A. Gupta and J. W. Lathrop, IEEE J. Solid-State Circuits SC-7, 389 (1972).
8. C. H. Stapper, IEEE Trans. Electron Deoices ED-Xi, 655 (1973).
9. R. M. Warner, Jr., IEEE J. So/id State Circuits SC-9, 86 (1974).
IO. C. H.Stapper, IBMJ. Res. Dee. 20,228 (1976). 11. 0. Pazand T. R. Lawson, Jr., IEEE J. Solid-State Circuits SC-12, 540 (1977).
12. See, for example, W. Feller, An Introduction to Probability Y=l-p.
(26)
6.CONCLWION
In this paper, we have shown that it is not proper to take account of the phenomenon of nonhomogeneous defect populations by integration with a defect distribution function either in the geometrical space or in the density space. Instead, the overall yield should be obtained by partitioning subpopulations of defects in suitable regions. It is true that all published forms of modification of the Poisson distribution by integration with a density distribution function do exhibit a concavity in the log Y-vs-A plot, a feature familiarized by experimental data. However, this feature of these modified distributions is merely a consequence of the general mathematical properties of averaging, and is neither the results of a correct physical model, nor even of a correct mathematical procedure. They may be reasonably acceptable in the range of available data to which such models are fitted; but extrapolated yield from such model may not be reliable. The error incurred from using these yield expressions will be proportional to the variance of defect density in the region of partition, and will generally be larger for multimodal defect populations. On the other hand, the accuracy of the simple Poisson distribution increases with the size of an IC chip. If the size of a purely imaginary IC increases to one quarter of a wafer (we would not expect this to happen, though), the simple Poisson distribution would become rigorously correct, regardless of the non-uniformity in defect density. The yield would then be as predicted by the pessimists-unless the defect density is decreased by learning. We have also shown that the clustering of defects, either because of interaction among themselves, or because of the hospitality of geographical characteristics of wafer regions, will tend to result in a higher yield than predicted by the Poisson distribution for a given average defect density. Finally, we have shown that when defects
Theory and Its Applications. 3rd edn. Vol. 1, pp. 20, 40, 41. Wiley, New York (1968). 13. S. M. Hu, S. P. Klepner, R.0.Schwenker and D. K. Seto, 1. Appl. Phys. 47,4098 (1976). 14. S. M. Hu, Appl. Phys. L&t. 32,5 (1978).
APPENDDil
E&motion of errors-incwedfro?
using yield expressions (2) or (3)
If the quantity AA is small (AA z l), one can estimate the error incurred by the use of eqn (3) (or equivalently, eqn 2) by the series expansion of the exponential function to the first three terms. From eqn (13). Y=~%e-~I~(si/S)(I-~,+~Az~~)
,
= I- Ai + (A2/2) x (si/S)&!
(Al)
From eqn (3) Y* = f
I
s emAlcr)dS(r) =if
= I-An+iA’i
I
s
[ 1-AA(r) +i A’A’(r)]dS(r)
sA2(r)dS(r).
(AZ)
Integratingthe last term in the above equation piece-wise over all areas si in S. Y*-I-AI+fA’x(si/S)i/
s,
h’(r)dsi(r)
= I - Ai+; A2 z (si/S)piz,
(A3)
where r.r: is the second moment about A = 0 in the region i. Hence the error AY is given by
AY- Y*-Y-~A'~(si/S)(p:-h,z) = ;A2 7 (si/Sk’,
(‘44)
where u? is the variance of the defect population in region i. We next show that the error AY increases with the increase of the size of the regions of partition of defect population (whose lower limit is the size of an IC chip), or equivalently with the
Some considerations in the formulation of IC yield statistics decrease of the number of such regions, N. We need only consider the extreme case N = 1; the intermediate cases can be readily inferred by recurrent procedure. When N = I, eqn (A4) becomes ye-
I Y=_A202 2
1
I
U,2= )x(A). (A- ,f,,)2dA= pa2-
,im2,
(AIO)
and similarly,
us2= pe2- i_q2,
c.0
where u2 is the variance of the defect density of the defect population of the wafer as a whole. The error given by eqn (A5) is larger than that given by eqn (A4) by an amount (1/2)A2[uzT (sJS)u?], which is positive as can be proved as follows: By
211
’ is the second moment about A = 0 for the subwhere CL,,, population m. For the defect population as a whole in any defined region i (to avoid clutter of symbols, we shall drop the subscript i from now on)
definition, 02=P2_i2; 7 (si/S)uiz = 7 (s/S)(fii2 - n;‘) = p2- T (s,/S,&‘.
(A6) (A7)
= edm2+ eBuo2te,e&2t e,e,Sip22e,edd@ = eau,*teflo2te,eo(im -&d2.
(Al2)
Subtracting (A7) from (A6), one has In the above expression we observe that, as the chip size shrinks
u2- 7 (dSk2 = 2 (o/S)62 - 2 = F (sdS)ii2 - [ 7 (smii 20:
a,,’ and ue2 may vanish, but (A, - Ae)2 will not, as 3” A,t_ozero,_both and Ae are by definition distinct in a bimodal population. One
(A@ would then have APPFNDIX 2
Effecf of multimodal defect population on errors The analysis in the above section is appropriate only for a unimodal defect population. Very often f(A) exhibits a multimodal distribution, possibly indicating the simultaneous existence of defects of different types or different origins. We will now show that for a multimodal defect population, u2 will not vanish as the size of an IC becomes infinitesimal. First consider a bimodal distribution. It is reasonable to assume that this defect population is a composite of two unimodal subpopulations o and p, with a population mix in fraction 0. and 6, in region i. Assume these subpopulations_to hav_etheir respective distributions f&) and f@(A), means A, and A#, and variances uaz and ue*. 0,+eg=1.
(A9)
(A13) The results of eqn (A12) can be readily generalized to a multimodal defect population, for which one has u2=~e~u~2tf~Ce~e~(~~-6)2, m nl n
(A14)
where the running indices m and n are to be summed over all subpopulations cr‘, & y, etc. By definition of multimodal distribution, i,,, # 1” if m # n. The last term in eqn (A14) will hence be always finite and positive. We thus conclude that for a multimodal defect population, the variance of defect density in a given region will not vanish as the size of the region shrinks to zero.