Machine learning of weld joint penetration from weld pool surface using support vector regression

Machine learning of weld joint penetration from weld pool surface using support vector regression

Journal of Manufacturing Processes 41 (2019) 23–28 Contents lists available at ScienceDirect Journal of Manufacturing Processes journal homepage: ww...

1MB Sizes 0 Downloads 72 Views

Journal of Manufacturing Processes 41 (2019) 23–28

Contents lists available at ScienceDirect

Journal of Manufacturing Processes journal homepage: www.elsevier.com/locate/manpro

Machine learning of weld joint penetration from weld pool surface using support vector regression Rong Lianga,b, Rui Yub, Yu Luoa, YuMing Zhangb,

T



a

State Key Laboratory of Ocean Engineering, Collaborative Innovation Center for Advanced Ship and Deep-Sea Exploration, Shanghai Jiao Tong University, Shanghai, China b Institute for Sustainable Manufacturing, Department of Electrical and Computer Engineering, University of Kentucky, Lexington, KY, United States

A R T I C LE I N FO

A B S T R A C T

Keywords: Weld joint penetration Support vector regression Predictive control Weld pool surface

Skilled human welders can control the weld joint penetration through observing the molten pool. This suggests that a model may be developed to predict the backside bead width, that quantitively measures the weld joint penetration, from the weld pool surface. However, the weld pool surface is specular and subject to the radiation of the arc such that its measurement is challenging. At the University of Kentucky, the weld pool surface is measured using an innovative a 3-D vision sensor that can overcome the challenges caused by the specular nature and arc radiation; and the measured surface is characterized by three parameters. Because of the lack of physics based model, neural networks would typically be used to approximate the unknown correction, which is nonlinear in general, between the backside bead width and the characteristics parameters. Unfortunately, neural networks require large amount of data to train for adequate model accuracy. While the weld pool surface can be measured using the innovative 3D sensor, the ground truth for the backside bead width needs to be measured offline after the experiment and to this end the work-work needs to appropriately cleaned/processed. Large amount of training data needed may not be easily obtained. To improve the critical ability to accurately predict the backside bead width, models need to be established from relatively small amount of training data. To this end, the authors propose to use the support vector regression (SVR) method and hypothesize that a SVR model trained using the small amount of the training data available would perform better than that a multi-layer perceptron (MLP) artificial neural network model trained using the same data. Modeling results show that for the relatively small training data available, the optimized SVR model provides a more accurate prediction to the backside bead width. As such, the authors systematically advanced the ability to accurately predict the weld joint penetration. The use of the innovative 3D sensor to obtain the 3D weld pool surface and the proposed use of the support vector method to address the small data issue played crucial roles.

1. Introduction Weld joint penetration is a basic requirement for welded structures and its accurate monitoring and control play an important role in manufacturing industry to ensure the quality of the welded structures. It is a challenging issue because the weld penetration, measured by either the back-side bead width for full penetration or the depth of the weld penetration for partial penetration, is not directly observable from a front-side sensor that can be attached to and move together with the welding torch. Finding signals, that can be measured by front-side sensors but are also inherently correlated to the weld joint penetration, has attracted many researchers. Nagarajian et al. [1] studied the use of infrared signals and correlated the penetration depth, weld bead width and torch position to the ⁎

infrared images. Carlson et al. [2] used ultrasonic signals to estimate the sidewall penetration. Wang [3] reviewed the use of three-dimensional weld pool measurements. At the University of Kentucky, a novel 3-D vision sensing system [4] has been proposed to measure the specular 3D weld pool in gas tungsten arc welding (GTAW) and correlated the measured 3D weld pool surface with the weld joint penetration. Because skilled human welders can estimate and successfully control the weld joint penetration based on their observation of the weld pool, mostly just its 3D surface, this method is considered promising. However, despite possible inherent correlations between the measured front-side signals, such as the 3D weld pool surface, and the weld penetration, successfully finding such correlations is not trivial. Extensive studies by skillful researchers are typically needed to establish the relationships and success may not be always assured although

Corresponding author. E-mail address: [email protected] (Y. Zhang).

https://doi.org/10.1016/j.jmapro.2019.01.039 Received 24 December 2018; Accepted 22 January 2019 1526-6125/ © 2019 The Society of Manufacturing Engineers. Published by Elsevier Ltd. All rights reserved.

Journal of Manufacturing Processes 41 (2019) 23–28

R. Liang, et al.

results in the distorted reflected dot matrix. By using image processing and 3D reconstruction scheme [15], this reflected dot matrix as shown in Fig. 1(b) will be used to reconstruct the 3D molten pool surface [15] as shown in Fig. 2(a). From this system, the characteristic parameters of molten pool surface [16] can be obtained and these measured data [16] are shown in Fig. 3. However, the backside bead width served as the label is measured offline, and because of the cares needed for the back surface of the work-piece before the back-side bead width can be measured, the data size is relatively limited. Hence, the data available for us to develop the model as shown in Fig.3 is less 800 which is considered relatively small.

the information about the weld joint penetration may have been fully contained in the front-side signals. Such a challenge in extracting correct information from raw signals is typically addressed in two ways. One of them is to use a deep learning algorithm [5,6] that uses an ultracomplex network such as a convolutional neural network (CNN) [7] to directly correlate the output (the weld penetration in our case) to the raw signals (raw images of the 3D weld pool surface). The requirement on the skills from the researchers is much reduced; but unfortunately to train such ultra-complex networks, the amount of the data needed is typically ultra-large. Another way is to propose features and use a general nonlinear mapping to correlate the output concerned to the features proposed. While the human involvement in features selection reduces the need for ultra-complexity, because of the lack of the knowledge of the actual relationship, a general nonlinear mapping with moderate flexibility is still needed. Neural networks with relatively less layers and neuro-fuzzy systems have been used. Mathew et al. [8] used heat input, wall thickness and mean radius to thickness ratio as input features to predict the residual stress in girth welded pipes by fitting a multi-layer artificial neural network. Researchers at the University of Kentucky [4] proposed the width, length, and convexity of the 3D molten pool surface as the features (characteristic parameters of the weld pool surface) to predict the back-side bead width that measures the state of the full penetration. Liu et al. [9] improved the predictive accuracy of the backside bead width by using a nonlinear dynamic adaptive neuro-fuzzy inference system (ANFIS) model. Such nonlinear mappings still require relatively large data to train. Among various algorithms, support vector regression (SVR) has the advantages of simultaneously considering different features and the capability of solving nonlinear problems [10,11]. It can obtain satisfactory model from relatively small training set with good balance between prediction precision and system complexity as well as good generalization ability [12,13]. This is because that the SVR uses appropriate kernel functions to address the nonlinearity in the input data helping reduce the needed model complexity [17]. Yao et al. [14] studied the wire feed speed prediction of double-wire-pulsed MIG welding using support vector regression. Song et al. [10] proposed an operating parameter conditioned support vector regression method to achieve a processing parameter independent in-situ composition prediction. This paper aims at improving the prediction of the backside bead width from front-side signals. To this end, the most promising front-side signals, i.e., the characteristic parameters - width, length and convexity - of the weld pool surface, are used as the features. Since the training data has to be generated experimentally and the associate cost is quite high, the key issue is how accurate models be established from relatively small amount of training data. Analysis suggests that the SVR may be a promising method. However, the actual effectiveness depends on the nature and complexity of the nonlinearity and must be validated through experiments for the particular problem being addressed. Modeling results validated the effectiveness of the SVR method in improving the modeling accuracy for the backside bead width. The ability to predict the weld joint penetration, which is critical and fundamental for welding of critical components and intelligent manufacturing, is thus advanced.

3. SVR model 3.1. SVR algorithm SVR is a machine learning algorithm that is mostly used in highdimensional pattern recognition to solve nonlinear problems. It maps the input variables into a high dimensional feature space by a nonlinear mapping, and then establishes a regression estimation function in this space [17]. A kernel function is used to avoid complex calculations in high dimensional spaces [18]. SVR has two characteristics: (1) tolerate the modeling error within a given range; (2) solve the nonlinear relationship by using kernel functions. A linear regression is used to demonstrate the first characteristic below. For a given set of training data {(x1, y1), ..., (x ℓ , yℓ )} , let’s first consider:

f (x ) = 〈w, x 〉 + b

(1)

where w is the weight vector and b the threshold value, and 〈⋅,⋅〉 denotes the dot product. For traditional regression models, the loss value will be calculated by the bias between the f(x) and label y. In SVR, as showed in Fig. 4, there are two parallel lines f(x)+ε and f(x)-ε. The bias between the label and model f(x) is zero if y lies between these two lines or the smallest distance to f(x)+ε or f(x)-ε at the given x is smaller than ε. This can be mathematically expressed as follows:

0 |(w⋅x i + b) − yi | < ε |y − f (x )| = ⎧ ⎨ |( ⎩ w⋅x i ) − yi | − ε |(w⋅x i + b) − yi | > ε

(2)

This particular bias defined is referred to as the soft margin [19]. Optimal modeling should be to minimize the overall soft margin defined in (2). However, to obtain a robust model, we require the model parameters to have small values [17]. One way to ensure this is to minimize the norm, i.e. || w ||2 〈w, w〉. Hence we arrive at the following formulation: 1



Minimize 2 || w ||2 + C ∑i = 1 (ξi + ξi*) ⎧ yi − 〈w, x i〉 − b ≤ ε + ξi ⎪ s. t 〈w, x i〉 + b − yi ≤ ε + ξi* ⎨ ⎪ ξi, ξi* ≥ 0 ⎩

(3)

where ε + ξi is the soft margin with ε + ξi* to consider the bias above/ below ξi/ ξi*. It is apparent that either ξi > = 0 or ξi* > = 0 must be zero while it is also possible that they are both zero. To solve the optimization problem (3), we can construct a Lagrange function by introducing a dual set of variables. The details are described in [17] and we proceed as follows:

2. Experimental system and data Fig. 1 shows the experimental system for gas tungsten arc welding (GTAW) process developed in the Welding Lab at the University of Kentucky [16]. The material for welding is a pipe of stainless steel 304. The outer diameter and wall thickness are 113.5 and 2.03 mm. In this system, a 20mw illumination laser generator at a wavelength of 684 nm is used to project a 19-by-19 dot matrix laser pattern that will be reflected by the specular molten pool surface. Under the plasma impact from the arc, the weld pool surface is depressed and distorted, which

ℓ 1 || w || 2 + C ∑i = 1 (ηi ξi + ηi*ξi*) 2 ℓ ∑i = 1 ai (ε + ξi − yi + 〈w, x i〉 + b) ℓ ∑i = 1 ai* (ε + ξi* + yi − 〈w, x i〉 − b)

L= − −

(4)

Here L is the Lagrangian and ηi , ηi* , ai*, ai (i = 1,…,) are Lagrange multipliers and they have to satisfy positivity constraints. According to the saddle point condition, the partial derivatices of L have to vanish for 24

Journal of Manufacturing Processes 41 (2019) 23–28

R. Liang, et al.

Fig. 1. 3D vision-based sensing system [16]. (a) System; (b) Reflected image.

Fig. 2. 3D weld pool surface and its characteristic parameters [15]. (a) Example of 3D reconstruction of molten pool. (b) Weld pool surface convexity. (c) Weld pool length and width. ℓ

optimality:

∂w L = w −



∂b L =

∑ (ai* − ai) = 0 i=1

∑ (ai − ai*) xi = 0 i=1

(5) 25

(6)

Journal of Manufacturing Processes 41 (2019) 23–28

R. Liang, et al.

Fig. 3. Measured characteristic parameters of the molten pool surface and backside bead width.

Fig. 5. Flowchart of backside bead width prediction using SVR model and vision-based sensing system.

K (x i , x ) = 〈ϕ (x i ), ϕ (x )〉

(12)

Likewise, Eq. (10) can be expressed as: ℓ

f (x ) =

∑ (ai − ai*) K (xi , x ) + b i=1

(13)

In this work, the RBF kernel function is used to establish the model which is as follows:

K (x i , x ) = exp(−|| x i − x ||2 /2σ 2)

Fig. 4. Schematic diagram of support vector regression.

∂ ξ (*) L = C − ai(*) − ηi(*) = 0

ai(*) ,

For simplifying the calculation, we assumed γ = 1/2σ 2 , and then the Eq. 14 can be written as follows:

(7)

i

K (x i , x ) = exp(−γ || x i − x ||2 )

ηi(*)

denote ai*, ai , ηi* , ηi . Substituting (5), (6) and (7) into where (4) yields the dual optimization problem as follows: 1

Maximize s.

ℓ t ∑i = 1



(ai + ai*) +

ℓ ∑i = 1 yi (ai

− ai*)

(ai − ai*) andai , ai* ∈ [0, C ]

Fig. 5 shows the flowchart of backside width prediction using vision-based sensing system and SVR. The labeled backside bead width denoted as Width_b is used as label yi in prediction model. The length, width and convexity of molten pool are used as input data x i . Then 10Fold Cross Validation [20] is used to preprocess these training data. That is, the training data is divided into 10 disjoint folds with approximately equal size, and each fold is in turn used to test the model induced from the other 9 folds by SVR algorithm.

(8)

ηi(*) ,

ηi are eliminated through condiIn Eq. (8), the dual variables tion (7). Eq. (6) can be rewritten as follows: ℓ

w=

∑ (ai − ai*) xi i=1

(15)

3.2. Flowchart of backside width prediction using SVR

⎧− 2 ∑i = 1 (ai − ai*)(aj − aj*) 〈x i , x j〉 ⎨−ε ∑ℓ i=1 ⎩

(14)

(9)

Thus Eq. (1) can be rewritten as follows: 4. Results and discussion



f (x ) =

∑ (ai − ai*) 〈xi , x〉 + b i=1

4.1. SVR and MLP results

(10)

It is common that correlation between train data x i and label yi is nonlinear. The nonlinearity of SVR could be achieved by simply preprocessing the training data x i by a map ∅ as described in [17]. Hence, Eq. (10) will be expressed as:

The characteristic parameters of the molten pool surface obtained from the sensing system are treated as the input variables to train the SVR and MLP models. To this end, the scales for different input variables are normalized first:



f (x ) =

∑ (ai − ai*) 〈ϕ (xi), ϕ (x )〉 + b i=1

y=

(11)

x − x min x max − x min

(16)

where x is the original data and y is the normalized ones. In order to evaluate the prediction performance of the fitted models, the authors propose to use two criteria: Root mean squared error (RMSE) defined as

However, the dimension of the primitive training data space maybe very high, which results in more complex calculations in the mapped dimensional space. In order to solve this difficulty, the kernel function is used [17]: 26

Journal of Manufacturing Processes 41 (2019) 23–28

R. Liang, et al.

Fig. 6. Comparison between measured and modeling results. Table 1 Parameters for SVR models and performance indicators for SVR and MLP models. Model

C

γ

ε

RMSE

SCC

SVR1 SVR2 MLP1 MLP2

2.26 1.261 —— ——

0.21 4.86 —— ——

0.558 0.158 —— ——

0.646 0.421 0.563 0.522

0.524 0.798 0.639 0.688

Fig. 8. Residuals of SVR2-optimal SVR model.

Fig. 9. Comparison between measured and SVR2 results.

is closer to 1, it means that the variable explanation for y is stronger and the model fits the data better [14]. The hyperparameters C, ε , and γ in the SVR model determine its prediction capability and the performance is determined by the combination of these three parameters [21]. To begin, we first determine these hyperparameters empirically based on our experience and refer such a model as SVR1. For comparison with MLP, we also introduce MLP1 whose hyperparameters such as hidden layer size and learning rate are also determined empirically based on experience. Fig. 6 shows the predictions on the backside bead width from the developed SVR1 and MPL1 models against the measurements. It can be seen that the predictions from both SVR1 and MLP1 in general reflect the variations in the measurements, though large errors occur occasionally. To be quantitative, their error/performance criteria are calculated per Eqs. 17 and 18. Table 1 lists these criteria. For SVR models whose number of parameters is small, the identified parameters are also listed. For SVR1, the RMSE and SCC are 0.524 and 0.646 respectively, while for MLP1 they are 0.563 and 0.639 respectively. According to the definitions of the criteria, lower RMSEs mean better model predictive

Fig. 7. Comparison between measured and optimal modeling results. (a) SVR2 – optimal SVR model; (b) MLP2 – optimal MLP model.

RMSE =

1 n

n



∑ (yi − yi )2 i=1

(17)

and Squared correlation coefficient (SCC) defined by: n

SCC = 1 −



∑i = 1 (yi − yi )2 n ∑i = 1

(y − yi )2

(18)

where y, yi , yˆi are the mean, actual value and predicted value, respectively. When the value of RMSE is closer to 0, it means the better fitting of the model and the more successful of the data prediction. SCC is an indicator for characterizing the fitting effect by data variation. When it 27

Journal of Manufacturing Processes 41 (2019) 23–28

R. Liang, et al.

correlated with the back-side bead width and is characterized using three parameters. Previous work quantitatively tested this hypothesis using the characteristic parameters as the representation of the weld pool surface by using complex networks as general mapping. Since large training data is impractical because of the need for experiments, more efficient nonlinear modeling methods are needed in order to improve the ability to more accurately predict the back-side bead width. To this end, this work proposed a support vector model that reduces the needed data by using appropriate kernel functions. Meanwhile, the use of kernel functions still allows the model to be capable of mapping complex nonlinear relationships. Experimental results, through comparison with neural networks and validation, verified the effectiveness of the proposed support vector model in correlating the characteristic parameters of the weld pool surface to the back-side bead width accurately despite the lack of physical process models. A more efficient method is thus available to derive the weld joint penetration from the weld pool from relatively small data samples.

accuracies [22]. For SCC, its range is [0,1] and the closer it is to 1 the more accurate the prediction is. Therefore, in terms of these two criteria, the performances of these two models are close and the accuracy of both models are not high. This is understandable because the hyperparameters determined by human empirically based on experience may not be as appropriate as by search algorithms because of the complexity of the models. 4.2. Optimization using grid search Several search algorithms are available to adjust the model parameters, including the gradient descent algorithm, grid search algorithm, and genetic algorithm. The grid search algorithm is one of the most popular algorithms and is widely used in many fields. For this algorithm, a multidimensional mesh is formed for the given possible ranges of all the model parameters; then the performance is calculated at all the mess nodes each of which corresponds to a set of model parameters; and the optimal combination of the model parameters can be obtained. This algorithm is suitable for models when the numbers of the date and parameters are manageable. Studies show that for some particular combinations of C and γ , the predicting accuracy is highly sensitive to ε [23]. An effective solution is to narrow down the range of ε such that ε can be more finely divided within its range. In this work, C, γ , and ε are searched for the optimum within 0.1 < C < 100, 0.01 < γ < 100, and 0.1 < ε < 0.7. These ranges are determined from literatures and the resultant optimum performance will be checked to determine if the search ranges need to be adjusted. The resultant models optimized using the grid algorithm with these ranges are referred to as SVR2 and MLP2. Fig. 7 shows the predicted backside bead width by SVR2 and MPL2 models. The resultant criteria are also given in Table 1. The RMSE and SCC for the SVR2 model are improved to 0.421 and 0.798, respectively. For the MPL2 model, they are also improved, to 0.522 and 0.688 respectively. The improvements in both SVR and MLP models are understandable due to the use of optimization that is finer than human decision basis. More importantly, the SVR2 model performs significantly better than the MLP2 model. This is also quite understandable that the training set including 760 samples is relatively small for the MLP model to be developed, but it becomes relatively large for typical SVR models whose number of parameters is relatively small as has been discussed earlier. Fig. 8 shows the residuals of SVR2 model from majority of the samples, it is clear that the distribution is normal. This further justifies the statistical accuracy of the proposed model.

References [1] Nagarajan S, Banerjee P, Chen WH, et al. Control of the welding process using infrared sensors. IEEE Trans Robot Autom 1992;8(1):86–93. [2] Carlson NM, Johnson JA. Ultrasonic sensing of weld pool penetration. Weld J (Miami; USA) 1988;67(11). [3] Wang XW. Three-dimensional vision applications in GTAW process modeling and control. Int J Adv Manuf Technol 2015;80(9-12):1601–11. [4] Zhang WJ, Liu YK, Wang X, et al. Characterization of three dimensional weld pool surface in GTAW. Weld J 2012;91(7):195s–203s. [5] Kim IS, Jeong YJ, Lee CW, Yarlagadda PK. Prediction of welding parameters for pipeline welding using an intelligent system. Int J Adv Manuf Technol 2003;22(910):713–9. [6] Luo Masiyang, Shin Yung C. Estimation of keyhole geometry and prediction of welding defects during laser welding based on a vision system and a radial basis function neural network. Int J Adv Manuf Technol 2015;81(1-4):263–76. [7] Zhang Yingjie, Hong Geok Soon, Ye Dongsen, Zhu Kunpeng, Fuh Jerry YH. Extraction and evaluation of melt pool, plume and spatter information for powderbed fusion AM process monitoring. Mater Des 2018;156:458–69. [8] Mathew J, Moat RJ, Paddea S, et al. Prediction of residual stresses in girth welded pipes using an artificial neural network approach. Int J Press Vessel Pip 2017;150:89–95. [9] Liu YK, Zhang WJ, Zhang YM. Dynamic neuro-fuzzy estimation of the weld penetration in GTAW process. Instrumentation and Measurement Technology Conference (I2MTC), 2013 IEEE International 2013:1380–5. [10] Song L, Huang W, Han X, et al. Real-time composition monitoring using support vector regression of laser-induced plasma for laser additive manufacturing. IEEE Trans Ind Electron 2017;64(1):633–42. [11] Hong WC, Pai PF. Predicting engine reliability by support vector machines. Int J Adv Manuf Technol 2006;28(1-2):154–61. [12] Shao Y, Lunetta RS. Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. ISPRS J Photogramm Remote Sens 2012;70:78–87. [13] Pradhan B. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 2013;51:350–65. [14] Yao P, Xue JX, Zhou K. Study on the wire feed speed prediction of double-wirepulsed MIG welding based on support vector machine regression. Int J Adv Manuf Technol 2015;79(9-12):2107–16. [15] Zhang WJ, Liu YK, Wang X, Zhang YM. Characterization of three-dimensional weld pool surface in GTAW. Weld J 2012;91(7):195s–203s. [16] Liu YK, Zhang YM. Model-based predictive control of weld penetration in gas tungsten arc welding. IEEE Trans Control Syst Technol 2014;22(3):955–66. [17] Smola AJ, Schölkopf B. A tutorial on support vector regression. Stat Comput 2004;14(3):199–222. [18] Cherkassky V, Ma Y. Comparison of model selection for regression. Neural Comput 2003;15(7):1691–714. [19] Cortes C, Vapnik V. Support vector networks. Mach Learn 1995;20:273–97. [20] Wong TT. Performance evaluation of classification algorithms by k-fold and leaveone-out cross validation. Pattern Recognit 2015;48(9):2839–46. [21] Chapelle O, Vapnik V, Bousquet O, et al. Choosing multiple parameters for support vector machines. Mach Learn 2002;46(1-3):131–59. [22] Scharf LL, Demeure C. Statistical signal processing: detection, estimation, and time series analysis. Reading, MA: Addison-Wesley; 1991. [23] Liu J, Cai H, Tan Y. Heuristic algorithm for tuning hyper parameters in support vector regression. J Syst Simul 2007;7(032).

4.3. Validation of the optimized SVR model To validate the obtained SVR2 model for its generalization ability, it is used to predict additional 60 samples that were not included in the data to train the model. From Fig. 9, it can be seen that the SVR2 model can also predict these unused data of backside bead width accurately. The EMSE and SCC for these 60 samples are 0.312 and 0.714 and are comparable with those in the training. The obtained SVR2 model is thus considered validated. As such, a better correlation is extracted between the molten pool characteristic parameters and backside bead width from a relatively small training set. For applications where large sets of data are not available or are expensive to obtain, SVR models provide an effective solution. 5. Conclusion Skilled human welders are capable of assuring full penetration as measured by the back-sided bead width. The weld pool surface, human welders observe as process feedback, is thus hypothetical to be

28