24
Thin Solid
Films.
2X
( IYK!) 24 29
The effect of subtractive defects and grain size on VLSI interconnect early failure Satish S. Menon and Kelvin F. Poole Center for Semiconductor
Device
Reliability
Research,
Department
of Electrical
und Computer
Engineering,
Clemson
Universit~~, Clemson,
SC 296.34 (USA)
Abstract The effect of subtractive defects, similar in appearance to those caused by processing flaws such as photomask pinholes or stress voiding, on the reliability of VLSI interconnects is studied. Analysis of the metal microstructure and variations in the current density and temperature, show that small grained VLSI interconnects containing subtractive defects will have significant early failures. More importantly, when the grain size is large compared to the linewidth at the defect site, a good tolerance to defect-related early failures is expected. Experimental observations on 3 pm wide Al-based test stripes, containing semi-circular defects that remove up to 80% of the linewidth, support these arguments. The life tests were conducted at a nominal current density of 1.5 x IO6A cm * to avoid the fusion regime, and at ambient temperatures of 80 “C, 125 “C and 200 “C, to study the influence of temperature. The influence of grain size was studied by using metallizations with mean grain sizes of 0.5 pm, 1 pm and 1.5 Frn. Results confirm the analysis that, as long as the current density at the defect site is not high enough to cause the metal to fuse by Joule heating, the metal microstructure near a defect site is more important than the
current density and temperature gradients.
1. Introduction Subtractive defects are found in VLSI interconnects in the form of notches or voids due to processing flaws such as photomask pinholes, improper etch conditions, particulates or stress voiding. The trend in industry is to decrease interconnect linewidths, but the defect sizes associated with the processing flaws do not necessarily decrease in proportion to linewidth. This differential leads to defects consuming a large fraction of the linewidth. Quality assurance screens and burn-ins, which are an integral part of manufacturing, can be optimized if the knowledge of the effect of subtractive defects along with other information such as the influence of grain size, are known. The first reported work to characterize the influence of subtractive defects on VLSI interconnect lifetime, was by Lloyd et al. [ 11. Photolithographical defects that removed 90% of the 12.5 pm lines, were studied and the defects were reported to cause a great decrease in their median time to failure. The primary effects of introducing a defect in an interconnect were considered to be current density gradients, the gradient in the number of grain boundaries, and thermal gradients in the vicinity of the defect. It was recognized that at current densities in the low lo6 A cm -‘, the thermal gradients are of little significance, when compared to the other two gradients. Although the effects of the first two gradients should cancel out ideally, the imbalance in their effects
0040-6090/92/$5.00
was proposed as the cause of the early failures. This approach has also been used in a recent paper [2] to explain the decrease in the median time to failure in interconnects containing intentionally grown stress voids. In the past few years, Clemson researchers have studied the influence of lithographically introduced subtractive defects on 3 pm metal interconnects [3-51. The model proposed by Kemp [3, 41, considers the current density increase at the defect site as the primary cause of early failures in the interconnects. None of the above literature [l-5] considers the influence of grain size. It is accepted (for example, see ref. 6 and 7) that an increase in the metal grain size can significantly improve electromigration lifetime in VLSI conductors. Recent Clemson results obtained by systematically varying the grain size have shown that a larger grain size will significantly improve the tolerance to defect-related early failures, and an empirical model to describe this effect has also been presented [S]. In the present paper, the authors explain the effect of subtractive defects and the influence of grain size on defect-related early failures, based on fundamental physical principles and substantiate the arguments using life-test results.
2. Theoretical
analysis
When an electric current talline thin-film conductor.
I(‘~) 1992 --
passes through a polycrysthe atomic flux J comes
Elsevier Sequoia. All rights reserved
S. S. Menon,
K. F. Poole 1 Subtractive
primarily from electromigration in the lattice and at the grain boundaries. For the lattice, the Huntington and Grone model that gives the atomic flux is J = N,QipeZ: I
(1)
kT
where ,V is the atomic density, D the diffusivity, j the current density, p the resistivity, eZ* the effective charge, k the Boltzmann constant and T the absolute temperature [9]. The subscript 1 indicates that the parameters refer to the lattice. Fundamentally, the values of D and eZ * change when a grain structure is considered. For an interconnect of given linewidth, we also need to correct for the fact that only a fraction of the linewidth, viz., the grain boundaries, participate in the diffusion. The atomic flux is then given by the Ho and Howard model as Nb t
D$peZ,*
0g
Jb=-
kT
(2)
The subscript b indicates that these quantities belong to the grain boundary. The quantity 6 is the effective boundary width for mass transport (approximately 10 A) and g is the average grain size [9]. We note here that this correction factor (S/g) will be accurate only if the grain size is much smaller than the linewidth. For example, consider an average grain size of 1.5 urn in a 3 urn line. Clearly, the correction factor should be (6/2g). This problem is solved by using the factor (n6/w) where n is the effective number of longitudinal grain boundaries and w is the linewidth. It is assumed that the grain boundaries completely pass through the thickness. Although it is understood that the transverse grain boundaries are vital in causing triple points for flux divergence in normal conductors, here we are interested only in the influence of the defect and therefore we consider only longitudinal grain boundaries in our analysis. When the grain size is much smaller than the linewidth, this reduces to the factor in eqn. (2). Rewriting eqn. (2) using (n6/w) gives
$
N,, Jb = -
D$peZz
0 kT
At moderate temperatures, close to or less than half the melting temperature for the metal, JI/Jb is about lop4 as estimated in ref. 9. Therefore, where there is at least one longitudinal grain boundary for atomic flux, Jb will dominate, but when the interconnect approaches the bamboo type structure, J, starts gaining importance. The authors agree with Lloyd et al. [ 11, that the atomic flux divergence dJ/dx, will determine the failure time. Substituting [lo]
defects and grain size in VLSI
25
interconnects
(4)
into eqn. (3) and then differentiating,
we obtain
(5) which is different from the equation obtained in ref. 1, in that the second term within parentheses is twice as great. The significance of this is that the contribution due to the gradient in the number of grain boundaries do not cancel that due to the current density gradient. In very small grain metallizations, where the grain size is at least an order smaller than the linewidth remaining at the defect site, (l/n) dn/dx will be equal to ( - l/j) dj/dx and hence the current density is still a factor. In general, this equality need not hold when the grain size is small. When it is comparable to the linewidth at the defect site, the first term may decrease, thus increasing the influence of the current density gradient. In very large grain size metallizations n will go to zero and eqn. (5) is not valid. For that situation, the flux divergence has to be derived from eqn. (1) and it will be different from eqn. (5) in that the first term within parentheses will be zero and the second term will be half as much. Therefore, the influence of the current density gradient may vary depending on the relative grain size but it will always significantly contribute to the flux divergence. Now consider the influence of the factor J before the terms in parentheses. Earlier works [ 1, 21 do not consider the changes in J, but here this is critical to see the overall influence of grain size. As the grain size increases, n/w which is approximately proportional to (l/g), decreases leading to a reduction in J and thereby dJ/dx. In large grain metallizations, where the remaining metal at the defect site will have a bamboo structure with no longitudinal grain boundaries, the value of E, will vary from a low value at the non-defected portion to a high value at the point of least linewidth. For example, in Al, the value of E, varies from 0.5 eV for grain boundary diffusion to 1.4 eV for lattice diffusion [6]. This will also contribute to the reduction in the value of J and hence dJ/dx, with increasing grain size. If this remaining flux divergence is smaller than the flux divergences occurring in the non-defected portions of the line, due to microstructural defects such as triple points that cause electromigration, then the latter will cause failure and the defect plays no role. One last point before we summarize. The influence of E, going up radically at the defect site due to lattice diffusion taking over, will be true if the median grain size is much larger than the linewidth at the defect. This will therefore be a dominant mechanism for the improvement in tolerance to defects, in submicrometer interconnects.
In summary, large grain metallizations will show good tolerance to defect-related early failures. On the other hand, in small grain metallizations, defects will cause significant early failures. The grain size is the critical parameter that can be controlled to overcome the effects of current density and thermal gradients, that would normally cause early failures. The control results from the capability of grain size to reduce J and thereby the J divergence. For illustration, consider a semi-circular defect of radius d, with its center on one edge of an interconnect of width W, as shown in Fig. 1. Also, let the defect be located in the center of a stripe of length L. The flux divergence introduced due to the defect between x = 0 and x = 2d, is calculated. Purely by geometrical considerations, the current density gradient can be calculated. Assuming a uniform grain structure, with only longitudinal and transverse grain boundaries, the gradient in the number of longitudinal grain boundaries is found. The temperature of the stripes at both ends of the length L is assumed to be the ambient temperature. The temperature at the center of the stripe, neglecting the presence of the defect, is estimated by the thermal model suggested by Lloyd et al. [ 111. Next, by incorporating the current density variations due to the defect, in the same model, the temperature variation at the defect site is estimated. It is found that the magnitudes associated with the thermal gradient term is much less than the other two and hence does not play a major role in the simulations. Figure 2 shows computer generated plots of the flux divergence along a defect, rl = 1.5 pm, in a 3 urn wide by 1000 urn long Al- l’%Si (E, = 0.5 eV) line. The grain sizes considered are 1 urn, 1.9 urn and 3 urn. The ambient temperature is set to 200 “C and the current density in the non-defected portion is taken as 1.5 x lo6 A cm --‘. As the grain size is increased from 1 urn to 1.9 urn, the influence of the current density gradient may increase depending on the contribution from the grain boundary term, but the influence of a lower J, predominates and causes the total flux divergence to be smaller in the 1.9 urn case. If one assumes a 1.9 urn grain at the defect site located at the center of the defect site, the
wl m ,
I
x=0
/
I
x - 2d
Fig. I. A metal stripe of width w and length L containing at its midpoint. a defect of radius d, placed with its center on one edge of the line.
3o Flux
20,
-30'
Divergence
(Arbitrary
Units)
I.
0
0.5
Distance ., 9
- 1 micron
Fig. 2. Computer defect. tl =
1.5
1
"
Along Defect 9 = 19
generated
mcron
2
2.5
I 3
(Microns) '1
9 = 3 micro"
plots for the flux divergences due to a
1.5pm, in a 3-pm line, for grain
sizes,
I pm. 1.9pm
and
3 pm.
value of the activation energy becomes large, causing the flux divergence to go to zero. This is shown by the dotted lines. In the 3 urn case, lattice diffusion takes over and on a relative basis, the flux divergence is zero. As seen by the variations in the flux divergences, a hillock formation is expected to the left of the point of least linewidth and an open formation at the right. It is less likely to have the failure occur exactly at the center of the defect
3. Experimental
results
Test stripes, about 1000 urn long and approximately 3 urn wide, containing semi-circular lithographical defects that remove either 50% or 80% of the linewidth, were life-tested along with non-defected lines. These were prepared from three different metal types, labelled as Metal I, II and III, respectively. They differed in their mean grain sizes which were measured to be 0.5, 1 and 1.5 urn, respectively. For pictures of the test structure, processing information and more physical data on the test stripes, the interested reader is referred to [3, 5, 81. Up to 200 test stripes could be life-tested at one time in the open-circuit failure monitoring system. The ambient temperature of the oven containing the test boards was maintained constant at either 80 “C, 125 “C or 200 “C, to study the influence of temperature. The current through each stripe was maintained at 50 mA which yields a nominal current density of 1.5 x 1Oh A cm ml in the 3 urn wide by 1 urn thick test stripes used. This was done to avoid the lo7 A cm _ * range when the Joule heating is significant enough to cause melting of the interconnect. Details of the life-test system have been described elsewhere [3, 51. In each life test, 20-60 stripes of each defect category were used. The median time to failures (f5,,) and the
27
S. S. Menon, K. F. Poole / Subtructive defects and grain size in VLSI interconnects
standard deviations of the logarithm of the failure times ((T) associated with the log-normal fits for the various failure distributions are tabulated in Tables 1 and 2, along with their 90% confidence intervals. The cumulative failure distributions for the three sample types are distinct in the case of Metal I, at both TABLE; life-tests
1. The t,,,s for the failure distributions along with their 90% confidence intervals 80 “C (104h)
of the
various
200 “C
125 “C (IO’h)
(h)
Metal 1, 80% defected
(3,30)
1.6 (l.l,2.3)
Metal I, 5O”X defected
IO
12”
(4,23)
Metal I, non-defected
10
19.3
(5. 18)
(1623)
Metal II, 80% defected
I (0.6, 1.5)
3 (2.2,4.0)
293 (236,362)
Metal II, 50% defected
:0.7, 1.3)
2 (I .5,2.6)
260 (209,322)
IO
80 “C and 200 “C [3]. Earlier failures are observed for the stripes with larger subtractive defects at both temperatures. Figure 3, shows the result at 80 “C. In addition, the failure analysis by visual inspection of the samples from each of these life tests, showed that 95% of the failures in the 80% defected stripes and over 30% of the failures in the 50% defected stripes occurred at or very close to the defect site. The 0s associated with the defected stripes are significantly greater than those of the non-defected ones. In addition, their values at 80 “C are significantly greater than in the 200 “C case. In the Metal II life tests, the cumulative failure distributions for the three sample types were not distinct, in fact they significantly overlapped each other, and this is true at all three temperatures. Figure 4 shows the result at 125 “C. This indicates that the defects played no role in deciding the distributions. In the case of Metal II, it is also seen that the 0s are considerably greater at lower temperatures.
Cumulative Failures 99.9%’
Metal II, non-defected
iO.6, 1.5)
2.5 (1.8,3.3)
229 (174, 301)
1
99% 90%
Metal III, 80% defected
I .42 (1.2, 1.7)
Metal III, non-defected
I.58 (1.4, 1.8)
75% 50% ,, 25%
,I
.’
~ _
10%
,_ “This is a highly bimodal distribution explained in [3, 51. Therefore confidence limits are not calculated for this result.
_I
0.1% 10
100
1000
10000
100000
Time (hours) TABLE 2. crs for the failure distributions along with their 90% confidence intervals 80 “C
of the various
125 “C
life-tests
200 “C
Metal I, 80% defected
4.4 (2.7, 6.6)
I.5 ( 1.25,1.75)
Metal I, 50”% defected
3 (I .8,4.5)
2.9, 0.7”
No Defect
-
50% Defect
80% Defect
Fig. 3. The cumulative failure distributions for the Metal I (metal = Al- l%Si, grain size = 0.5 pm) test stripes, separately shown for the non-defected, 50% defected and 80% defected stripes, tested at 80 “C.
Cumulative
Failures
99.9v
Metal I, non-defected
( 1.2,3)
2
Metal II, 80% defected
1.3 (0.8, 1.9)
0.8 (0.6, 1.2)
0.5 (0.4,0.7)
Metal II, 5O”X defected
0.8 (0.6,0.9)
0.7 (0.5, 1.1)
0.56 (0.44,0.77)
Metal II, non-defected
1.4 (0.8, 2.1)
0.8 (0.6, 1.2)
0.7 (0.6, 0.9)
0.8 (0.7,0.9)
99%
1
100
Metal III. 80% defected
0.43 (0.34,0.59)
Metal III, non-defected
0.36 (0.29,0.49)
“These are us for the defect component and the electromigration the highly bimodal failure distribution [3].
1000
Time (hours) J
in
No Defect
b
50% Defect
d
80% Defect
Fig. 4. The cumulative failure distributions for the Metal Ii (metal = Al- l%Si- l%Cu, grain size = 1 pm) test stripes, separately shown for the non-defected, 50% defected and 80% defected stripes, tested at I25 “C.
28
S. S. Menon,
Cumulative
0.1%
K. F. Poole 1 Subtractive
Failures
1
I 0.1
defects und gruin
size in VLSI
interconnec~t.s
plays a role in deciding the failures. This has been modelled empirically and has been presented elsewhere [8]. It can also be seen that 0 varies significantly with temperature. The common assumption used in the extrapolation of accelerated life-test data to use conditions, is that (T is a constant [ 131. The data presented here, suggest that use should be made of the Chan model [ 141 to deal with the temperature dependence of CT.
10
Time ;hours) 5. Conclusions No Defect
80% Defect
Fig. 5. The cumulative failure distributions for the Metal III (metal = Al- l%Si, grain size = 1.5pm) test stripes, separately shown for the non-defected and 80% defected stripes, tested at 200 “C.
The 200 “C test for the Metal III samples also showed that the defects played little role in causing early failures in the defected stripes as shown in Fig. 5. Of all the defected samples that failed, only one failure could be visually detected to have occurred at the defect site, and 84% of the failures occurred at random locations on the stripe.
4. Discussion As predicted by theory in Section 2, we observe that in the Metal I samples that had a grain size of 0.5 urn, the defected samples failed a lot earlier than the non-defected ones. The terms within parentheses in eqn. (5) are greater for the 80% and the 50% defected samples. Since the J factor has a j term in it, the 80% defected samples have a higher J divergence than the 50% defected samples and therefore exhibit earlier failures. In the case of Metal II and III, the reduced n caused the flux divergence to go so low due to the reduced J, that the stripes were found to fail due to the regular phenomena of electromigration and remain unaffected by the presence of defects. Notice however, that the t5,,s were much larger in the case of Metal II and this is attributed to the higher activation energy for atomic flux (0.7 eV) [ 121 due to the presence of Cu. Simulations indicated that the thermal term should not have any significance at any of the temperatures used here, and as expected, temperature did not play a role in deciding whether a defect will cause an early failure. There are some other effects that can be observed from the data presented here. The defect causes the 0 of the log-normal failure distributions to increase, if it
The flux divergence equation of Lloyd et al. that explained the effect of the defect in an interconnect has been modified. The modified equation shows that a total annihilation of the current density divergence by the gradient in the number of grain boundaries, as suggested by the original equation, is not possible. Since the thermal gradients due to the defect can be neglected when the current density is not too high, the primary effects of the defect are the grain boundary gradient and the current density gradient. In very small grain metallizations, the grain boundary gradient will partially cancel the current density gradient term. In practical small grain size metallizations, the grain boundary gradient term will be small, thereby increasing the influence of the current density gradient term. But as the grain size increases, the value of the atomic flux itself goes down due to the reducing number of available grain boundaries. This is more important in deciding the resultant flux divergence than the individual gradients considered. In very large grain metallizations, the performance will be much better than when the grain size is small, due to the effect of the higher lattice diffusion activation energy. Therefore, we can conclude that as long as the current density at the defect site is not high enough to cause fusion of the metal, the grain microstructure is the most important parameter that decides whether or not an interconnect with a defect will suffer from an early failure.
Acknowledgments The authors would like to thank Dr. J. E. Harriss from the Microstructures Laboratory at Clemson University for his help in sample preparation. The SRC member companies that supplied the test structures and wafers are gratefully acknowledged. We also appreciate financial support from IBM, under contract number AC-004 and from Semiconductor Research Corporation, under contract numbers 90-MP-082 and 91 -MP-082.
S. S. Menon, K. F. Poole 1 Subtractive defects and grain size in VLSI interconnects
References 1 J. R. Lloyd, P. M. Smith and G. S. Prokop, Thin Solid Films, 3 (1982) 385. 2 S. A. Lytle and A. S. Oates, J. Appl. Phys., 71 ( 1992) 174. 3 K. G. Kemp, The prediction of early failures in VLSI interconnects due to random subtractive defects, PhD Thesis, Clemson University, 1989. 4 K. G. Kemp, K. F. Poole and D. F. Frost, IEEE Trans. Reliab., 3 ( 1990) 26. 5 S. S. Menon, K. G. Kemp and K. F. Poole, Proc. IEEE Southeastcon ‘91, 1991, pp. 383-387. 6 S. Shingubara and Y. Nakasaki, Appl. Phys. Lett., 58 (1991) 42. 7 J. Cho and C. V. Thompson, Appl. Phys. Lett., 54 (1989) 2577.
8 S. S. Menon, 9
10
11 12 13 14
A. K. Gorti
and
K. F. Poole,
29 Proc. IEEE
Int.
Reliab. Phys. Symp., 1992, pp. 373-378. T. Kwok and P. S. Ho, in D. Gupta and P. S. Ho (Eds.), DQiision Phenomena in Thin Films and Microelectronic Materials, Noyes Publications, 1988, pp. 385-387. D. Gupta, in D. Gupta and P. S. Ho (Eds.), Diffusion Phenomena in Thin Films and Microelectronic Materials, Noyes Publications, 1988, p, 22. J. R. Lloyd, M. Shatzkes and D. C. Challener, Proc. IEEE Int. Reliab. Phys. Symp., 1988, pp. 216-225. J. R. Lloyd and R. H. Koch, Appl. Phys. Lett., 52 (1988) 195. J. W. McPherson, Proc. SRC Workshop on Reliability, Dallas, Texas, 1991, p. 82. C. K. Chan, IEEE Trans. Reliab., 40 (1991) 157.