A deep learning-based approach to material removal rate prediction in polishing

A deep learning-based approach to material removal rate prediction in polishing

G Model CIRP-1571; No. of Pages 4 CIRP Annals - Manufacturing Technology xxx (2016) xxx–xxx Contents lists available at ScienceDirect CIRP Annals -...

661KB Sizes 243 Downloads 393 Views

G Model

CIRP-1571; No. of Pages 4 CIRP Annals - Manufacturing Technology xxx (2016) xxx–xxx

Contents lists available at ScienceDirect

CIRP Annals - Manufacturing Technology jou rnal homep age : ht t p: // ees .e lse vi er . com /ci r p/ def a ult . asp

A deep learning-based approach to material removal rate prediction in polishing Peng Wang, Robert X. Gao (1)*, Ruqiang Yan Department of Mechanical and Aerospace Engineering, Case Western Reserve University, Cleveland, OH, USA

A R T I C L E I N F O

A B S T R A C T

Keywords: Polishing Process control Deep learning

Prediction of material removal rate (MRR) during chemical mechanical polishing is critical for product quality control. Complexity involved in polishing makes it challenging to accurately predict MRR based on physical models. A data-driven technique based on Deep Belief Network (DBN) is investigated to reveal the relationship between MRR and polishing operation parameters such as pressure and rotational speeds of the wafer and pad. The effect of network structure and learning rate on the accuracy of predicted MRR is studied using particle swarm optimization algorithm. With an optimized network structure, the performance of DBN is experimentally verified, under varying operation conditions. ß 2017 Published by Elsevier Ltd on behalf of CIRP.

1. Introduction Chemical mechanical polishing (CMP) based on a balanced interaction of chemical reaction and mechanical abrasion enables the designed functionality of a workpiece (e.g. a silicon wafer) by minimizing surface defects. Typically a CMP tool consists of a rotating polishing pad, a workpiece with its rotating carrier, a slurry dispenser, and a rotating dresser (see Fig. 1). During the CMP process, a brittle film is first formed on the surface of the wafer due to the chemical reaction, which is subsequently removed by the abrasives suspended in the slurry and the friction between the pad and the wafer [1]. Unwanted conductive or dielectric materials on the surface of the wafer are removed by repetitive cyclic process. The productivity of CMP is characterized by the material removal rate (MRR), which at the same time also affects the surface quality of the polished product [2]. Given the nature of the chemical and mechanical interactions involved, MRR is collectively affected by various process parameters, such as material properties (e.g. hardness) of the wafer and pad, the slurry, and operating conditions (e.g. pressure and rotational speeds). Prior research has aimed to quantitatively express MRR as a function of dominating process parameters, based on physical and experimental evidence. As an example, the Preston equation assumes MRR to be proportional to the applied pressure and relative rotational speed between the polishing pad

Fig. 1. Polishing tool and interaction between wafer and pad. * Corresponding author. E-mail address: [email protected] (R.X. Gao (1)).

and wafer [3]. Luo and Dornfeld [4] and Jeng [5] analyzed the interaction between the pad, abrasives, and wafer from a microcontact perspective. In their model, pressure was characterized by the number of abrasive particles, the geometry, and pad deformation. Dambon and Demmer [6] investigated the influence of wafer material on MRR experimentally, and noted that it is the deformation capacity of the wafer material instead of the material hardness that actually determines the removal level, when explaining why hardened wafers had shown higher MRR. Jeong [7] approached the problem from the point of balance between chemical reaction and mechanical abrasion, and classified polishing in terms of easy or difficult to abrade and/or react. Experiments conducted by Klocke and Zunke [2] indicated that hardness of abrasives in the slurry plays a vital role in MRR. When polishing silicon carbide, diamond as a hard abrasive has high MRR but led to high roughness of the polished wafer due to the occurrence of micro cracking. Takaya et al. studied the effect of slurry on MRR from a chemical perspective, and concluded that MRR increases with an increase of the slurry chemical reactivity [8]. In most of the prior studies, quantitative MRR prediction models were mainly based on the theory of tribology, accounting for mechanical abrasion but neglecting chemical reaction. Furthermore, even for abrasion modelling, only a subset of mechanical process parameters were considered, due to the complexity of the physics associated with the polishing process. Given these considerations, this paper takes a data-driven approach to modelling the complex relationship between MRR and the various process parameters, using Deep Learning (DL) as the numerical tool. Specifically, Deep Belief Network (DBN) is investigated to model the complex relationship between MRR and the underlying process parameters, given its capability in characterizing non-linear parameters and outputting results that are probabilistic in nature. DBN is constructed as a stack of unsupervised Restricted Boltzmann Machines (RBMs) and a supervised perceptron. The RBMs have been reported in the literature to perform deep analysis of data patterns and discover the underlying features [9]. As a result,

http://dx.doi.org/10.1016/j.cirp.2017.04.013 0007-8506/ß 2017 Published by Elsevier Ltd on behalf of CIRP.

Please cite this article in press as: Wang P, et al. A deep learning-based approach to material removal rate prediction in polishing. CIRP Annals - Manufacturing Technology (2017), http://dx.doi.org/10.1016/j.cirp.2017.04.013

G Model

CIRP-1571; No. of Pages 4 P. Wang et al. / CIRP Annals - Manufacturing Technology xxx (2016) xxx–xxx

2

MRR is expressed as a function of both raw process parameters and the revealed features through the supervised perceptron. To design a systematic DBN-based approach towards robust and generalizable MRR prediction in CMP, key process parameters for MRR prediction are first summarized, followed by optimization of the network structure and learning rate through a particle swarm optimization (PSO) algorithm. A comparative study between the proposed method and previously reported models is performed using CMP data from real-world production. 2. Material removal characteristics Material removal rate is subject to multiple process parameters. Due to the complexity involved, comprehensive consideration of all parameters in one model is challenging, as illustrated by two representative prior studies based on physical or empirical models. They approached the problem of MRR modelling from a macroscopic and microscopic point of view, respectively. 2.1. Pressure and relative rotational speed As the first attempt for MRR modelling, Preston proposed that MRR is proportional to the pressure applied on the wafer and the relative rotational speed between the two rotating bodies [1]. The influence of the rest of the process parameters is characterized by a constant Kp. Experimental studies found that this model is limited in accurately explaining the results. For example, when polishing silicon nitride with ceria, an increase in the pressure is seen to increase the MRR significantly, whereas an increase in the relative rotational speed did not show a similar trend [2]. A general form of the Preston equation is given by introducing exponential coefficients for the pressure P and speed V as: MRR ¼ K p P a V b

(1)

The two coefficients a and b are typically within the range of [0–1], but vary with the polishing conditions [10]. Different studies have shown different values of a and b. 2.2. Abrasive number, geometry, pad roughness and deformation Since the interactions between pad and wafer directly affect MRR, Luo and Dornfeld have proposed a material removal model by which MRR is considered to be subject to the pressure applied to abrasives, penetration depth, volume removed by a single abrasive, and the number of active abrasives [4]. The first two factors relate to the wafer density. The model is expressed as: MRR ¼ rw NVolremoved

(2)

with rw denoting the wafer density, N the active abrasive number, and Volremoved the volume of material removed by a single abrasive. This model is able to explain experimental observation that a softer and rougher pad yields larger MRR, because more active abrasives contribute to the abrasion. The effect of slurry chemical properties is not included in the model.

2.4. Synthesizing effect of process parameters Due to the complex interactions involved, it is challenging to include all process parameters into one comprehensive MRR model. Limited representation of process parameters leads to the constraint that typically, each model can only explain the associated experimental results while producing contradictory conclusion for other experiments. As an example, model (2) and associated experiment [4] indicated that a softer wafer had a higher MRR, while an experiment conducted in [6] showed that a wafer with higher hardness had higher MRR. To comprehensively consider these process parameters and linking them with the MRR, data collected from process measurement can be exploited using data science, given that they are reflective of the underlying physics. For this purpose, Deep Learning is investigated. 3. Deep belief network for MRR modelling As an emerging machine learning structure imitating human thinking, Deep Learning (DL) has attracted increasing attention [9], given its successful applications in facial recognition, video tracking, etc. In this study, a variant of DL, termed Deep Belief Network (DBN), is investigated for MRR prediction, based on the relationship revealed between process parameters and MRR. 3.1. General structure of DBN DBN is a feed-forward neural network with multiple hidden layers, capable of revealing deep rooted patterns embedded in the data set or modelling complex relationship between two datasets. DBN is composed of two modules: an unsupervised feature extraction module by a stack of Restricted Boltzmann Machines (RBMs), and a supervised perceptron for data classification and regression [9]. Fig. 2 illustrates a DBN with two RBMs and one perceptron for MRR regression. The first layer represents process parameters involved in the polishing process. RBM is a two-layer (visible layer and hidden layer) probabilistic neural network. Both visible and hidden units are binary and stochastic in nature. Upon unsupervised training of a RBM, its visible units can be accurately reconstructed by its hidden units. This means the hidden units exactly represent the visible units in a different dimensional space, and no information loss occurs during data transformation. One major advantage of representing a dataset in a different dimensional (especially high-dimensional) space is that it improves the ability of the network in fully revealing the hidden patterns underlying the data analyzed. Once a RBM is trained, its hidden units can be taken as visible units for the next RBM, initiating another round of unsupervised training. By using a stack of RBMs, the network input data (i.e. process parameters) can be represented in multiple high-dimensional spaces, which further improves the accuracy in data pattern discovery. Subsequently, outputs from the RBMs and the original inputs are passed to the perceptron for MRR regression. The perceptron is a feedforward three-layer neural network trained by a back-propagation algorithm, and models the MRR as a function of the input process parameters and outputs from the RBMs. 3.2. Training of DBN

2.3. Pad and dresser usage condition In the previous models, parameters such as pad characteristics (e.g. roughness and deformation) or effective number of active abrasives have been assumed to be constant. In reality, these parameters are affected by the pad’s usage, therefore is a function of time. Byrane et al. [11] pointed out that the planarity assumption of the wafers becomes inadequate as they go through the polishing process, and the MRR slows down as the pad becomes worn. In addition, because of the worn pad, the stress applied to the wafer edge was found to be up to 30% higher than at the centre, leading to a convex surface. However, no model has been found in the published literature yet that quantitatively describes the effect of pad and dresser usage on the MRR.

Training of DBN consists of two steps: training of individual RBM, and fine-tuning of the entire network. The objective of RBM

Fig. 2. A DBN structure with two RBMs and one perceptron.

Please cite this article in press as: Wang P, et al. A deep learning-based approach to material removal rate prediction in polishing. CIRP Annals - Manufacturing Technology (2017), http://dx.doi.org/10.1016/j.cirp.2017.04.013

G Model

CIRP-1571; No. of Pages 4 P. Wang et al. / CIRP Annals - Manufacturing Technology xxx (2016) xxx–xxx

training is to optimize the network parameter u, which includes the weights of the network links, wi,j, and the biases of the layers ai and bj, (for the 1st RBM, i: 1, . . ., n, j: 1, . . ., m and for the 2nd RBM, i: 1, . . ., m, j: 1, . . ., l), to accurately reconstruct its visible units based on its hidden units. Given the probabilistic nature of RBM, the objective of RBM training is to maximize the conditional probability of visible units given the parameters u. For visible units x = (x1, x2, x3, . . ., xn) of the 1st RBM, the probability of hidden units h1 = (h11, h12, h13, . . ., h1m) is expressed as: ! n X (3) wij xi þ ai Pðh1j ¼ 1ju; xÞ ¼ d i¼1

3

where i denotes the ith particle, k denotes the kth iteration, and V represents the velocity of the ith particle at the k + 1th iteration. The velocity V is determined by the best position recorded by this particle Bi,Best and the best position of the particle swarm BkBest : Vikþ1 ¼ r 1 Vik þ r 2 ðBi;Best Bki Þ þ r 3 ðBkBest Bki Þ

(7)

By iteratively executing (6) and (7), the best position can be found corresponding to the least MRR regression error. 4. Experimental study 4.1. Experimental setup

where wij represents the weights connecting the visible unit xi to the hidden unit h1,j, and a denotes the biases associated with the visible units. d is the Sigmoid logistic activation function that produces a binary output of the network. Because of the nature of the Sigmoid function, it is required that inputs to the DBN have a broad distribution across the range of [0–1] to fully discriminate the data. If the inputs have a limited band of distribution, the MRR regression accuracy would be low [12]. Based on the hidden units obtained from (3), the conditional probability of a reconstructed visible unit can be obtained as: 0 1 m X @ (4) wij h1j þ bj A Pðxi ¼ 1ju ; h1 Þ ¼ d where b denotes the biases associated with the hidden units. Consequently, the objective of parameters training is to maximize the conditional probability in (4). Through a contrastive divergence algorithm [12], weights wi,j are recursively updated, and the stepby-step change is related to the inner product of the visible and hidden units, as:

The data set for evaluating the developed DBN algorithm is provided by the 2016 Prognostics and Health Management (PHM) Data Challenge [14]. The objective is to investigate the effect of usage of polishing pad and dresser on the MRR. It consists of a training dataset (to establish a MRR model) and a test dataset (to test the established model). The training and test datasets were collected from 1975 and 423 wafers, respectively, in a production line. The average MRR in the provided datasets fall into two distributions: [55-100] and [140-165] nm/min, respectively. Accordingly, the training and test datasets are further separated into two MRR groups: low-speed (Group A) and high-speed (Group B), as shown in Table 1. The data was collected from typical polishing tools with a polishing pad, a dresser and a wafer [14]. Both the wafer and polishing pad rotated in the same direction. After polishing was completed, the polishing pad was conditioned by a dresser to roughen the pad’s surface and improve its polishing properties. Process parameters measured during the production are shown in Table 2. Also, an average MRR for each wafer was provided.

Dwij ¼ hðhxi ; h1j ioriginal hxi ; h1j ireconstructed Þ

Table 1 Training and Test datasets collected from a production line.

j¼1

(5)

where<>original and <>reconstructed refer to the inner product over original data and reconstructed data, respectively. The symbol h denotes the learning rate. By performing (5) for a predefined number of iterations Tst, the first and rest of the RBMs can be trained successively, as illustrated in Fig. 3.

Fig. 3. Recursive training of RBMs.

3.3. DBN structure and parameter optimization The performance of DBN, which is significantly affected by the network structure and associated parameters. The majority of published literature adopted a trial-and-error approach to finding the optimal network parameters, which is time consuming. In this study, the DBN structure and learning rate are optimized through a Particle Swarm Optimization (PSO) algorithm, which solves the problem through a bundle of particles. These particles are defined with positions (values of network structure and learning rate) and velocities (delta values) [13]. The particles are moved around in a search space to find an optimal position corresponding to the least MRR regression error. For the structure depicted in Fig. 2, each particle is designed to be a 3-dimensional position vector B(m, l, h), which contains three DBN parameters to be optimized: the number of the 1st and 2nd hidden layer units, m and l, and the network learning rate, h. Each particle’s position B is recursively updated by adding a velocity V, which provides the direction and magnitude for the update of vector B towards the optimum: ¼ Bki þ Vikþ1 Bkþ1 i

(6)

Training Test

Group A

Group B

Total

1611 350

364 73

1975 423

Table 2 Process parameters given as usage, pressure, and rotational speed. No.

Parameter

No.

Parameter

1 2 3 4 5 6 7 8

Pad usage Dresser usage Dresser table usage Pad table usage Polishing membrane usage Wafer carrier usage Chamber pressure Wafer edge pressure

9 10 11 12 13 14 15 16

Wafer rotational speed Dresser rotational speed Pad rotational speed Slurry flow rate Wafer pressure measured at four different locations

4.2. Network training and prediction performance To ensure that data samples can be fully discriminated, it is necessary to check if all the 16 process parameters are broadly spread across the [0–1] range to meet the activation function requirement in (3), before using DBN to establish a MRR model for parameters in Table 2. In Fig. 4, distributions of 16 normalized parameters of training dataset for Group A are shown. It is seen that all the normalized rotational speed and pressure concentrate within a narrow range, as compared to the normalized usage parameters that are spread across [0–1]. This is due to the fact that pressure and rotation speed settings are tightly controlled in real-world production. Under such condition, the inclusion of pressure and rotational speed into the DBN model would not increase the MRR prediction accuracy, from a computational point of view, although they are critical process parameters that directly affect MRR, from a physics point of view. Therefore, for this dataset, only the six usage parameters are included into the MRR model.

Please cite this article in press as: Wang P, et al. A deep learning-based approach to material removal rate prediction in polishing. CIRP Annals - Manufacturing Technology (2017), http://dx.doi.org/10.1016/j.cirp.2017.04.013

G Model

CIRP-1571; No. of Pages 4 P. Wang et al. / CIRP Annals - Manufacturing Technology xxx (2016) xxx–xxx

4

Fig. 4. Distributions of 16 normalized process parameters. Table 3 Optimized DBN structure and learning rate, and PSO parameters.

Number of RBMs Number of hidden units in 1st RBM, m Number of hidden units in 2nd RBM, l Learning rate, h Population size of PSO Acceleration factor of PSO {r1, r2, r3}

Group A

Group B

2 2 49 0.06 50 {1, 2, 2}

2 2 28 0.1 50 {1, 2, 2}

Furthermore, to improve accuracy, data from groups A and B were modelled separately. During the network training process, the network structure and parameter are optimized by PSO, with the result shown in Table 3. With two trained DBNs for Group A and B, the test data are first input into DBN A. If the predicted MRRs fall into the range [55-100] nm/min range, they are taken as the final prediction results. Predictions beyond this range are routed to DBN B. By grouping predictions from the two DBNs, distributions of the DBN-predicted and measured (actual) MRR are obtained. Fig. 5(a) shows the measured and DBN-predicted MRRs, which confirms the performance of the developed DBN algorithm.

below 0.01, which differ from the conclusions in literatures [1,10]. This indicates that the experimental conditions investigated in this paper, which is characterized by tightly controlled pressure and rotational speed settings, must have been different from that investigated in (1). For this specific data set, including pressure and rotational speed has shown not to improve the MMR prediction. However, this does not mean that pressure and rotational speed have no effect on MMR. As no physical measurement related to the number of abrasives is provided, the physical model described in (2) could not be evaluated using the dataset provided by PHM. Therefore, the performance is expressed using data directly taken from [4]. It is seen that in both cases, DBN-trained model has shown high prediction accuracy, and compares favourably to the models reported previously. The DBN-model has also been compared to models based on other machine learning algorithm for MRR prediction, such as back-propagation neural network (BP-NN) and support vector regression (SVR). It was found that DBN has outperformed both methods in terms of prediction accuracy (RMSE 2.6 vs. 3.3 for BP-NN and 3.1 for SVR, respectively). 5. Conclusion A data-driven method based on Deep Learning has been investigated to model the relationship between process parameters and MRR in wafer polishing. The work is motivated by the complexity associated with the polishing process and the effectiveness of Deep Learning in extracting patterns hidden in data. The effort is aimed at complementing physical model-based approaches established by prior researchers to improve MRR prediction. An optimization method based on PSO is applied to automatically find the optimal structure and learning rate in DBN. Evaluation was performed using data collected from a large number of wafer polishing production runs, and good results are obtained. It is envisioned that the developed method can be refined to benefit data analytics in other fields of manufacturing. Acknowledgement This research is partially supported by the National Science Foundation under CMMI-1300999 and CCF-1331850. References

Fig. 5. (a) Comparison of distributions and (b) quantitative comparison between predicted and measured MRR distributions.

A quantitative evaluation of the DBN-based prediction is given in Fig. 5(b), using the root mean square error (RMSE): vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N u1 X (8) ðMRRpredicted MRRmeasured Þ2 RMSE ¼ t N i¼1 It is seen that high accuracy MRR prediction by DBN has been achieved, with the RMSE within the range of 2.6–2.8. The performance of the DBN-based model is next compared with the physics-based models in (1) and (2), as shown in Table 4. For the physical model described by (1), two sets of coefficients (Kp, a, and b) are trained for Group A and B using the PHM-provided training dataset. It is found that the obtained a and b values are all Table 4 Comparison between physical models and DBN model, in RMSE.

Group A Group B Average

DBN

1st physical model

2nd physical model

2.6 2.8 2.7

42.3 16.6 29.5

7.6 7.6

[1] Evans C-J, Paul E, Dornfeld D, Lucca D-A, Byrne G, Tricard M, Klocke F, Dambon O, Mullany B-A (2003) Material Removal Mechanisms in Lapping and Polishing. CIRP Annals – Manufacturing Technology 52(2):611–633. [2] Klocke F, Zunke R (2009) Removal Mechanisms in Polishing of Silicon based Advanced Ceramics. CIRP Annals – Manufacturing Technology 58(1):491–494. [3] Preston F-W (1927) The Theory and Design of Plate Glass Polishing Machines. Journal of the Society of Glass Technology 11:214. [4] Luo J, Dornfeld D (2001) Material Removal Mechanism in Chemical Mechanical Polishing: Theory and Modelling. IEEE Transaction of Semiconductor Manufacturing 14(2):112–133. [5] Jeng Y-R, Huang P-Y (2005) A Material Removal Rate Model Considering Interfacial Micro-Contact Wear Behavior for Chemical Mechanical Polishing. ASME Journal of Tribology 127:190–197. [6] Dambon O, Demmer A (2006) Surface Interactions in Steel Polishing for the Precision Tool Making. CIRP Annals 55(1):609–612. [7] Lee H-S, Jeong H-D (2009) Chemical and Mechanical Balance in Polishing of Electronic Materials for Defect-Free Surfaces. CIRP Annals – Manufacturing Technology 58(1):485–490. [8] Takaya Y, Kishida H, Hayashi T, Michihata M, Kokubo K (2011) Chemical Mechanical Polishing of Patterned Copper Wafer Surface using Water-Soluble Fullerenol Slurry. CIRP Annals – Manufacturing Technology 60(1):567–570. [9] Hinton G-E, Salakhutdinov R (2006) Reducing the Dimensionality of Data with Neural Networks. Science 313:504–507. [10] Tseng W, Wang Y (1999) Re-examination of Pressure and Speed Dependences of Removal Rate During Chemical Mechanical Polishing Processes. Journal of Electrochemical Society 144(2):L15–L17. [11] Byrne G, Mullany B, Young P (1999) The Effect of Pad Wear on Chemical Mechanical Polishing of Silicon Wafers. CIRP Annals – Manufacturing Technology 48(1):143–146. [12] Hinton G-E (2012) A Practical Guide to Training Restricted Boltzmann Machines. Lecture Notes in Computer Science 7700:599–619. [13] Kennedy J, Eberhart R (1995) Particle Swarm Optimization. Proceedings of IEEE International Conference on Neural networks 1942–1948. [14] Data Challenge PHM (2016) Society of Prognostic and Health Management. http://www.phmsociety.org/events/conference/phm/16/data-challenge.

Please cite this article in press as: Wang P, et al. A deep learning-based approach to material removal rate prediction in polishing. CIRP Annals - Manufacturing Technology (2017), http://dx.doi.org/10.1016/j.cirp.2017.04.013