Expert Systems with Applications 38 (2011) 6275–6280
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Wavelet-coupled backpropagation neural network as a chamber leak detector of plasma processing equipment Byungwhan Kim ⇑, Sanghee Kwon Department of Sejong University, Sejong University, 98, Kunja-Dong, Kwangjin-Ku, Seoul 143-747, Republic of Korea
a r t i c l e
i n f o
Keywords: Plasma processing equipment Chamber leak Neural network Discrete wavelet transformation Continuous wavelet transformation Detection CUSUM control chart
a b s t r a c t In order to improve equipment throughput and device yield, chamber leaks needs to be strictly monitored. A new technique for leak detection is presented and this was accomplished by combining backpropagation neural network, discrete wavelet transformation (DWT), and continuous transformation (CWT). Different types of BPNN models were constructed with raw, DWT, and CWT data and these are referred to as raw, DWT, and CWT models, respectively. Constructed models were validated with a total of 47 data sets for normal and leaky chamber conditions. The experimental data were in-situ collected by using an optical emission spectroscopy. Both raw and DWT models could detect all abnormal data sets. Worst detection by CWT model was noted. Wider detection margin provided by DWT model was attributed to enhanced sensitivity of model to leaky condition. A modified cumulative control chart was applied to the statistical mean of raw OES spectra as well as to DWT and CWT data. The statistical mean-based CUSUM control chart was unable to detect chamber leaks. In contrast, chamber leaks could be identified by all model-based CUSUM control charts. Of the proposed models, DWT model is identified to be the most appropriate to chamber leak detection. Ó 2010 Elsevier Ltd. All rights reserved.
1. Introduction In fabricating electronic circuits, plasma processes play a crucial role in depositing and etching thin films. Abnormal plasma state degrades film quality as well as device yield. Therefore, plasma state should be stringently monitored during the process. To monitor equipment plasma, various in-situ sensors have been utilized. An optical emission spectroscopy (OES) to collect radical information was used to monitor plasma etch system (Yue, Qin, Markle, Nauert, & Gatto, 2000). A radio frequency (RF) impedance sensor was introduced to monitor the resistance and reactance of chamber plasma (Bushman, Edgar, & Trachtenberg, 1997). Recently, an ion energy analysis system was used to monitor plasma-enhanced chemical vapor deposition system (Kim & Kim, 2009). Of the in-situ sensors, the OES is the most popular for a variety of manufacturing purposes such as an etch endpoint detection (White et al., 2000). Meanwhile, the OES was coupled with neural network not only to predict film characteristics (Kim & Kwon, 2008), but for the on-line detection and diagnosis of plasma fault (Hong & May, 2004). Apart from the OES, neural network was coupled with X-ray photoelectron spectroscopy for plasma diagnosis (Kim, Kim, & Choi, 2009). Another potential application of OES is to
⇑ Corresponding author. E-mail address:
[email protected] (B. Kim). 0957-4174/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2010.11.088
detect an unexpected leak from a plasma processing chamber. The chamber leak induces a variation in chamber pressure, thereby changing collision rate between particles. The subsequent variations in the ion and radical concentration deteriorates film qualities and resultantly device yield. From this perspective, the chamber leak should be stringently monitored while being detected quickly once it occurs. Currently, chamber leak is diagnosed by using a He leak detector. The detector identifies leaky equipment components by measuring the leak rate. A huge loss of equipment throughput is inevitable because the diagnosis is conducted after stopping the equipment run. Moreover, a severe degradation in film qualities is expected because the leak is checked after the leak progressed much. This demands a real-time scheme for leak detection. As the OES mentioned earlier is applied for real-time monitoring, only a few radicals of significance are often traced. However, an improved detection is likely to be achieved by monitoring a whole OES spectrum. In other words, the total variation in radicals tends to yield enhanced sensitivity to leak occurrence. This may be accomplished by constructing a neural network model of leak data. This mainly stems from the fact that the neural network is capable of learning and processing a huge number of radicals. One related attempt was made (Kim & Kim, 2009), where the neural network model was constructed using in-situ ion energy pattern. To improve detection performance, meanwhile, the neural network may be coupled with data filtering techniques such as the wavelet
6276
B. Kim, S. Kwon / Expert Systems with Applications 38 (2011) 6275–6280
transformation (DWT) (Mallat, 1989). In the context of plasma monitoring, the DWT was used to analyze atomic force microscopy (Kim & Kim, 2007) or radio frequency impedance matching data (Kim & Kim, 2005). Apart from the DWT, evaluating the continuous wavelet transformation (CWT) is of interest. Up to now, a waveletcoupled neural network model for leak detection was not reported. For alarm generation, the model predictions can be coupled with a CUSUM control chart (Montgomery, 1985), which is frequently used to detect shifts in sensor data. The detection performance of this scheme was not examined before. In this study, a leak detection scheme is developed by combining neural network, OES, and DWT (or CWT). OES data were in-situ collected during a plasma-enhanced deposition process. Of neural network paradigms, the backpropagation neural network (BPNN) (Rummelhart & McClelland, 1986) was utilized. The detection performances of BPNN models are analyzed and compared. Further analysis is conducted by performing a model-based sensitivity analysis as well as by coupling models with the CUSUM control chart.
wavelet (w), the continuous wavelet transformation (CWT) is defined as
CWTða; bÞ ¼
Z
þ1 1
1 tb dt f ðtÞ pffiffiffi w a a
ð1Þ
where a and b denote the dilation and translation parameters, respectively. Meanwhile, as expressed in (2), the discrete wavelet transformation (DWT) decomposes f(t) into approximation and detailed parts by using a low and a high pass filters
f ðtÞ ¼
X XX ðf ; /j0;k Þ/j;k þ ðf ; wj;k Þwj;k j>j0
k
ð2Þ
k
Here, ðf ; /j0 ;k Þ and ðf ; wj;k Þ are called the approximation (or scale) and detail (or wavelet) coefficients, respectively. In DWT, a function / called the scaling function is chosen to satisfy
/ðtÞ ¼ 21=2
X
hs /ð2t sÞ
ð3Þ
s
where a sequence fhs g is known as a low-pass filter.
2. Experimental data
3.2. Backpropagation neural network
Experimental OES data were collected during the deposition of silicon nitride by using plasma-enhanced chemical vapor deposition system located in a manufacturing site. The process condition is not provided because it is a proprietary property of company. A total of 48 OES spectra were collected and they consisted of 43 normal and 5 abnormal processes. The leak rates measured by the He detector were 8.7 109 and 7.8 108 Pa-m3/s for the normal and abnormal (leaky) processes. The cause for the leak was identified to be worn-out O ring. The error at each wavelength was defined as the difference between the normal and abnormal OES intensities corresponding to the wavelength. As shown in Fig. 1, the errors seen in the range of 330–427 nm are noticeably large enough to distinguish them from others.
The BPNN was used to construct a leak model of OES data. As shown in Fig. 2, the BPNN consisted of input, hidden, and output layers. One single hidden layer was adopted. In Fig. 2, I and k represent the OES intensity and wavelength, respectively. The m and k indices are used to generate training and testing data sets. For a given wavelength, m reflects the number of intensities measured and presented to the network (i.e., intensities at k Dk, k 2Dk, . . . , k mDk, where Dk is the sampling interval). The k index corresponds to the network’s intensity prediction (i.e., the intensity at k + kDk). The neurons in the output and hidden layer used linear and bipolar sigmoid transfer functions, respectively. The network training tolerance, magnitude of initial weight distribution, and gradients of the two neuron transfer functions were set to 0.1, ±1, and 1, respectively. As a training algorithm, the popular generalized delta rule (Rummelhart & McClelland, 1986) was used, which is expressed as
3. Wavelet and backpropagation neural network 3.1. Wavelets
W ijk ðm þ 1Þ ¼ W ijk ðmÞ þ gDW ijk ðmÞ
Wavelets are frequently used to remove noise, compress signals or images, or detect abnormal signals. For a signal f(t) and a mother
where W ijk is the connection strength between the jth neuron in the layer (k 1) and the ith neuron in the layer k, and DW ijk is the calculated change in the weight defined as
DW ijk ¼
12000 11000
@E @W ijk
ð5Þ
where the E is the error to be minimized through the weight updating rule above. The parameters m and g indicate the iteration
10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 880
836
791
744
697
649
600
550
499
448
395
342
288
233
0 178
Error (a.u.)
ð4Þ
Wavelength (nm) Fig. 1. Error variation of normal and abnormal (leaky) data.
Fig. 2. Structure of BPNN leak detector.
6277
B. Kim, S. Kwon / Expert Systems with Applications 38 (2011) 6275–6280
number and learning rate, respectively. The learning rate was set to 0.01. The E is reduced by adjusting weights using the rule (4).
the network consisted of two inputs and one output. The chosen (m, k) enables the neural network to learn the whole OES pattern as accurate as possible. However, it should be noted that depending on (m, k) the monitoring performance of neural network model can vary as demonstrated in our recent work (Kim & Kim, 2009). The number of hidden neurons in the hidden layer was fixed at 4. Each leak pattern was composed of 3648 elements. For the chosen (m, k), the training and testing data consisted of 1823 vectors for raw OES and wavelet.
4. Results 4.1. Preparation of model data The raw OES data were filtered by using DWT and CWT. Several factors are involved in each wavelet. The factors are the scale level and the type of daubechies function for DWT and the dilation and the translation for CWT. They were set to the same one because their optimization is beyond the scope of this study. A single normal OES pattern was used to build a detection model in this study. For this, one OES pattern was selected of 43 normal processes. In Fig. 2, both m and k were set to the same 1. This indicates that
4.2. Performance of leak detector The performance of the BPNN leak detection model was measured by the root mean square error (RMSE). For convenience, the models trained using the raw OES, DWT-filtered OES, and
70000
Intensities (arb. units)
60000 50000
Actual
Prediction
40000 30000 20000 10000 0
1
261
521
781
1041
1301
1561
1821
1561
1821
Test Data Fig. 3. Prediction performance of BPNN model trained with raw data.
120000
Intensities (arb. units)
100000 Actual
80000
Prediction
60000 40000 20000 0 -20000 1
261
521
781
1041
1301
Test Data Fig. 4. Prediction performance of BPNN model trained with DWT data.
120
Intensities (arb. units)
100 80 Actual
60
Prediction
40 20 0 -20 -40 -60 -80
1
261
521
781
1041
1301
Test Data Fig. 5. Prediction performance of BPNN model trained with CWT data.
1561
1821
B. Kim, S. Kwon / Expert Systems with Applications 38 (2011) 6275–6280
CWT-filtered OES are called ‘‘raw model’’, ‘‘DWT model’’, and ‘‘CWT’’ model, respectively. For modeling, one raw OES data obtained during a normal operation was selected. Choosing other normal OES data is expected to produce similar results since they were observed to have similar training and testing errors. The testing RMSE of raw model constructed with the raw OES data is 283 and the normalized error with respect to the largest prediction is about 0.44%. For the DWT and CWT models, they are 428 and 2.13 (E 15), respectively. The normalized prediction errors are 0.47% and 2.26% for the DWT and CWT models, respectively. The prediction performances of raw and DWT models are almost comparable. Of three models, the CWT model yields the largest prediction error. The prediction performances of constructed models are shown in Figs. 3–5 for raw, DWT, and CWT models, respectively. The constructed models were validated with the other 47 data sets. The results are shown in Figs. 6–8 for the raw, DWT and CWT models, respectively. The first 45 and last 5 files in each figure
3.5E-15
3.0E-15
RMSE (arb. units)
6278
2.5E-15
2.0E-15
1.5E-15
1.0E-15 1
6
11
16
21
26
31
36
41
46
Number of Test Files 360
Fig. 8. RMSE variation of CWT data model.
340
RMSE (arb. units)
320 300 280 260 240 220 200
1
6
11
16
21
26
31
36
41
46
Number of Test Files Fig. 6. RMSE variation of raw data model.
560 540 520
RMSE (arb. units)
500 480 460
correspond to the normal and leaky data, respectively. Both raw and DWT models produce relatively smaller RMSEs for the leaky data compared to those for the normal data. This is due to an increase in the chamber pressure by the leak. In contrast, some of RMSEs for the CWT model in Fig. 6 show comparable or even higher than the normal ones. This indicates that this type of model is not adequate for chamber leak detection. Differences between models are evaluated in detail by introducing a detection criterion in terms of a RMSE range. For this, a normal range was calculated from the normal RMSEs depending on the test file number. The upper limit of abnormal RMSEs was then determined. Here, the upper limit is equal to the largest RMSE of all abnormal ones. The detection margin was computed simply by subtracting the upper limit from the smallest RMSE of the normal range. To facilitate the comparison, a normalized detection margin was introduced and this was calculated by dividing the detection margin by the smallest RMSE observable in the normal range. The results are shown in Table 1. As shown in Table 1, the DWT model yields the highest detection margin. This means that this model can detect chamber leak with the most enhanced sensitivity. Table 1 also reveals a unique diagnostic feature offered by DWT model, an improved detection with decreasing the number of normal data. This implies that given a small set of training data the DWT yields the most powerful detection model. To support the improved detection of DWT as noted earlier, the sensitivity of testing errors to each model error is evaluated and for this it is defined as
RMSEt;p RMSEm;p 100ð%Þ RMSEm;p
440
ð6Þ
420 400 380 360 340 320 300
1
6
11
16
21
26
31
36
Number of Test Files Fig. 7. RMSE variation of DWT data model.
41
46
where RMSEt,p and RMSEm,p indicate the testing and model prediction errors, respectively. It should be noted that the testing errors indicate those shown in Figs. 6–8. The results are shown in Fig. 9. As shown in Fig. 9, in the range of normal data, the CWT model demonstrates the most considerable variation. Meanwhile, for comparison purpose, (6) was applied to each of five errors for the leaky data. Here, the percent sensitivity was normalized by dividing itself by the smallest percent sensitivity observable in the normal range. They are 7.60, 9.30, and 9.39 for raw, DWT, and CWT models, respectively. The normalized percent sensitivities are detailed in Table 2. The total sum in Table 2 is just the summation of all percent
6279
B. Kim, S. Kwon / Expert Systems with Applications 38 (2011) 6275–6280 Table 1 Analysis of detection margin of leak models. Number of normal files
Normal range
10 20 30 Upper limit of leaky data
Normalized Detection margin (%) 15
Raw model
DWT model
CWT model (10
280–293 263–293 263–293 255
434–484 403–484 387–484 373
2.18–2.79 2.02–2.79 2.02–2.79 1.98
Raw Data Model DWT Data Model CWT Data Model
9.13 2.22 2.22
0
1
4
7
10 13 16 19 22 25 28 31 34 37 40 43 46
Sh ðiÞ ¼ max½0; x l þ Sh ði 1Þ
ð7Þ
Sl ðiÞ ¼ max½0; l x þ Sl ði 1Þ
ð8Þ
Here, the subscripts ‘‘h’’ and ‘‘l’’ indicate upward and downward shifts, respectively. The l and x represent the statistical mean and current value, respectively. The belief that denotes the degree the severity of shift is calculated using the belief function defined as
Belief ¼
Fig. 9. Model sensitivity to normal raw, DWT, and CWT models.
Table 2 Normalized percent sensitivity for abnormal data. File number
Raw model
DWT model
CWT model
6.64 4.73 4.01 2.82 2.19
6.91 4.79 4.28 3.64 3.41
4.72 5.14 3.99 0.14 2.27
20.42
23.04
11.45
sensitivities for a given model. The largest sum of DWT model indicates that the model is the most sensitive to DWT data, supporting the largest margin noted earlier. The lowest detection margin
1 n o Sh=l 1 þ e set2 =0:5
ð9Þ
where the set value is equal to the statistical mean of all means for the normal OES data. The CUSUM chart was then applied to those calculated statistical means of each OES data and the results are shown in Fig. 10. In Fig. 10, the mean variation shows that those means for the leaky data appears to increase. In contrast, there exists a little distinction between the beliefs for the normal and leaky data. In other words, the belief variation corresponding to five leaky data is incapable of providing any insight into the leak occurrence, thereby making it hard to generate an alarm. Meanwhile, the modified CUSUM control chart was applied to the model predictions and the corresponding variation in the beliefs are shown in Fig. 11. As illustrated in Fig. 11, the beliefs corresponding to the leaky data increase linearly, demonstrating that the model-based monitoring is more effective to identifying chamber leaks than the conventional statistical monitoring as already seen in Fig. 10. 0.240
5000
0.235
Intensity (arb. units)
4500
0.230 0.225
4000
0.220 3500
0.215 0.210
3000 2500
Average for Raw OES CUSUM
0.205 0.200 0.195
2000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47
Number of Test Files Fig. 10. Variations of statistical mean and CUSUM belief.
Belief (arb. units)
Percent Varaition (%)
10
Number of Test Files
Total
CWT model
14.06 7.20 3.88
For real-time purpose, a modified cumulative sum (CUSUM) control chart [?] is utilized to detect chamber leak. In the modified CUSUM control chart, a shift in sensor data is accumulated by Eq. (7) or Eq. (8), respectively.
-10
43 44 45 46 47
DWT model
8.93 3.00 3.00
4.3. CUSUM-coupled leak detector
20
-20
Raw model
for the CWT model as already noticed in Table 1 can be explained by the least sensitivity as evidenced by the smallest total sum in Table 2.
40
30
)
6280
B. Kim, S. Kwon / Expert Systems with Applications 38 (2011) 6275–6280
in-situ data collected during semiconductor manufacturing processes.
CUSUM Belief (arb. units)
0.40
Raw OES DWT-OES CWT-OES
0.35
Acknowledgement This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (20090087476).
0.30
0.25
`
References 0.20
0.15 1
4
7
10 13 16 19 22 25 28 31 34 37 40 43 46
Number of Test Files Fig. 11. Comparison of model-based CUSUM belief.
5. Conclusion A new detection model of chamber leak was presented. Experimental leak data were in-situ collected by using an optical emission spectroscopy. Apart from the raw model, DWT and CWT models were constructed and evaluated each other. All the models demonstrated a good capability to detect leaky OES spectra. Most of all, the DWT model yielded the largest detection margin, resulting mainly from the enhanced sensitivity to leaky condition. A modified CUSUM control chart was also coupled with neural network models. Unfortunately, the conventional technique to monitor statistical variables was incapable of detecting chamber leaks. In contrast, the neural network model-coupled CUSUM control chart successfully detected chamber leak. By optimizing the impact of the factors involved in DWT, the performance of model prediction is expected to be more enhanced as well as detection sensitivity. The proposed technique can be applied to any other
Bushman, S., Edgar, T. F., & Trachtenberg, I. (1997). Radio frequency diagnostics for plasma etch systems. Journal of the Electrochemical Society, 144, 721–732. Hong, J., & May, G. S. (2004). Neural network-based real-time malfunction diagnosis of reactive ion etching using in situ metrology data. IEEE Transactions on Semiconductor Manufacturing, 17, 408–421. Kim, B., & Kim, S. (2005). Diagnosis of plasma processing equipment using neural network recognition of wavelet-filtered impedance matching. Microelectronic Engineering, 82, 44–52. Kim, B., & Kim, W. (2007). Wavelet monitoring of spatial surface roughness for plasma diagnosis. Microelectronic Engineering, 84, 2810–2816. Kim, B., & Kim, S. (2009). Monitoring of plasma processing chamber by using ion energy analyzer and time-series neural network. Surface Engineering. doi:10.1179/174329409X455449. Kim, B., Kim, J., & Choi, S. (2009). Use of neural network to model X-ray photoelectron spectroscopy data for diagnosis of plasma etch equipment. Expert Systems with Applications, 36, 11347–11351. Kim, B., & Kwon, M. (2008). Optimization of principal component analysis-applied in-situ spectroscopy data using neural networks and genetic algorithms. Applied Spectroscopy, 62(1), 73–77. Mallat, S. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11, 674–693. Montgomery, D. C. (1985). Introduction to statistical quality control. Singapore: John Wiley & Sons. Rummelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing. Cambridge: MIT Press. White, D., Goodlin, B., Gower, A., Boning, D., Chen, H., Sawing, H., et al. (2000). Low open-are endpoint detection using a PCA-based T2 statistics and Q statistic on optical emission spectroscopy measurements. IEEE Transactions on Semiconductor Manufacturing, 13, 193–207. Yue, H. H., Qin, S. J., Markle, R. J., Nauert, C., & Gatto, M. (2000). Fault detection of plasma etcher using optical emission spectra. IEEE Transactions on Semiconductor Manufacturing, 13, 374–384.