An improved deep forest model for forecast the outdoor atmospheric corrosion rate of low-alloy steels

An improved deep forest model for forecast the outdoor atmospheric corrosion rate of low-alloy steels

Journal of Materials Science & Technology 49 (2020) 202–210 Contents lists available at ScienceDirect Journal of Materials Science & Technology jour...

2MB Sizes 0 Downloads 22 Views

Journal of Materials Science & Technology 49 (2020) 202–210

Contents lists available at ScienceDirect

Journal of Materials Science & Technology journal homepage: www.jmst.org

Research Article

An improved deep forest model for forecast the outdoor atmospheric corrosion rate of low-alloy steels Yuanjie Zhi a , Tao Yang a,∗ , Dongmei Fu a,b,∗ a b

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, 100083, China Beijing Engineering Research Center of Industrial Spectrum Imaging, University of Science and Technology Beijing, Beijing 100083, China

a r t i c l e

i n f o

Article history: Received 30 June 2019 Received in revised form 13 January 2020 Accepted 25 January 2020 Available online 28 February 2020 Keywords: Random forests Deep forest model Low-alloy steels Outdoor atmospheric corrosion Prediction and data-mining

a b s t r a c t The paper proposes a new deep structure model, called Densely Connected Cascade Forest-Weighted K Nearest Neighbors (DCCF-WKNNs), to implement the corrosion data modelling and corrosion knowledgemining. Firstly, we collect 409 outdoor atmospheric corrosion samples of low-alloy steels as experiment datasets. Then, we give the proposed methods process, including random forests-K nearest neighbors (RF-WKNNs) and DCCF-WKNNs. Finally, we use the collected datasets to verify the performance of the proposed method. The results show that compared with commonly used and advanced machine-learning algorithms such as artificial neural network (ANN), support vector regression (SVR), random forests (RF), and cascade forests (cForest), the proposed method can obtain the best prediction results. In addition, the method can predict the corrosion rates with variations of any one single environmental variable, like pH, temperature, relative humidity, SO2 , rainfall or Cl− . By this way, the threshold of each variable, upon which the corrosion rate may have a large change, can be further obtained. © 2020 Published by Elsevier Ltd on behalf of The editorial office of Journal of Materials Science & Technology.

1. Introduction The corrosion is the result of chemical, electrochemical or physical interactions between materials and their surrounding environments [1]. It not only threatens the construction safeties [2], but also leads to huge economic losses. According to surveys and expert experiences, the annual economic losses caused by corrosion reaches about 4 trillion dollars in the worldwide [3]. However, the losses could be reduced by 25 %–30 % with the help of effective corrosion models and scientific management [4]. Furthermore, the model can also be used to the potential unknown knowledge mining within the corrosion data [5,6]. Therefore, the study of corrosion prediction model has an important research significance in the material corrosion engineering. Recently, corrosion researchers use machine-learning algorithms such as artificial neural networks (abbreviated as ANN) [5–11], support vector machines (abbreviated as SVM) [12–15], power and power-linear function [16], hidden markov model (abbreviated as HMM) [17–21], etc. to implement the modeling [6–8,10,6–16,18], forecasting [5,9,12,16,17,19,20] and data mining [5,7,8,21] of the corrosion data. However, for most of the

∗ Corresponding authors. E-mail addresses: [email protected] (T. Yang), fdm [email protected] (D. Fu).

above works, the corrosion samples are obtained by indoor corrosion experiments or corrosion current experiments with just one single material [5,7,8,11,12,14,15] or one single input variable [16,17,21]. The corrosion predictions of various materials in a variety of outdoor environmental conditions still pose challenges to those commonly used models. Nowadays, a new machine-learning algorithm called random forests (abbreviated as RF) [22] has attracted the attention of scholars. RF is a branch of ensemble methods, which has the advantages of easy-implementation, anti-overfitting, excellent-performance, robustness, interpretability, etc [23,24]. Additionally, RF can be used to infer each variable importance and extract the key variables [25,26]. Therefore, RF are widely used in the fields of image recognition [27], geography [28], economics [29], manufacture [30], agriculture [31], etc. In the corrosion field, researchers use RF to implement the corrosion current classification [32,33], corrosion severity classification [34] and crack growth rate analysis [35]. As for outdoor atmospheric corrosion rate modelling, the previous results show that RF can obtain better results than general methods like ANN, SVR and linear-regression [36]. The above studies illustrate that RF is an appropriate approach of the corrosion datasets modelling. To further improve the performances of RF, researchers propose other models such as the weighted combination of base tree models [49,50], the deep-structure RF model [38], etc. The weighted

https://doi.org/10.1016/j.jmst.2020.01.044 1005-0302/© 2020 Published by Elsevier Ltd on behalf of The editorial office of Journal of Materials Science & Technology.

Y. Zhi et al. / Journal of Materials Science & Technology 49 (2020) 202–210

203

Table 1 Element compositions of 17 LAS in collected datasets. Alloy

06CuPCrNiMo 09CuPCrNiA 09CuPTiRe 09MnNb(s) 10CrCuSiV 10CrMoAl 14MnMoNbB 15MnMoVN 16Mn 16MnQ D36 JN235(RE) JN255 JN255(RE) JN345 JN345(RE) JY235(RE)

Element Compositions (wt.%) Mn

S

P

Si

Cr

Cu

Ni

Fe

0.40 0.40 0.40 1.18 0.31 0.45 1.53 1.52 1.40 1.37 1.40 0.52 0.67 0.39 0.39 0.36 0.27

0.023 0.023 0.019 0.024 0.002 0.002 0.010 0.004 0.025 0.023 0.018 0.025 0.006 0.005 0.005 0.011 0.010

0.050 0.015 0.080 0.027 0.010 0.012 0.022 0.026 0.009 0.030 0.022 0.030 0.016 0.010 0.110 0.089 0.089

0.17 0.26 0.28 0.20 0.62 0.35 0.34 0.40 0.36 0.30 0.39 0.30 0.07 0.62 0.05 0.28 0.28

0 0.10 0 0.10 0.83 0.98 0.10 0.10 0.10 0.10 0.05 0.10 0.02 0.83 0.90 0 0

0.17 0.05 0.29 0.05 0.25 0.09 0.05 0.05 0.05 0.07 0.05 0.07 0.05 0.25 0.40 0.29 0.29

0 0.02 0 0.10 0.10 0 0 0 0 0.05 0 0 0.05 0.10 0.65 0 0

Balance Balance Balance Balance Balance Balance Balance Balance Balance Balance Balance Balance Balance Balance Balance Balance Balance

combination model assigns different weighted values to different base tree models to weaken the influences of low accuracy tree models. The deep structure model enhances the representation ability by change the way of model structures. However, the above improved methods only solve classification problem, neither regression nor prediction. In this paper, we propose a new RF model, called Densely Connected Cascade Forest-Weighted K Nearest Neighbors (abbreviated as DCCF-WKNNs). It is a layer-bylayer representation algorithm, and each layer consists of several RF-WKNNs, i.e. a weighted assigning method of RF model, to implement the corrosion rate prediction (regression) and data-mining. For verifying the performance of the proposed method, we use 409 low-alloy steels corrosion samples under outdoor atmospheric condition as experimental datasets [36]. The experimental results show that the proposed method had a better performance than ANN, SVR and RF model. Additionally, the paper uses the proposed method to forecast the trend of corrosion rate when one input variable changes. It can find out a threshold of each variable, which acts as an importance factor of assisting material corrosion. For example, the results show that the critical relative humidity of the outdoor sites is 65 %. The organization of the paper is as follows. The collected corrosion samples are described in Section 2. In Section 3, the random forests-weighted K-nearest neighbors (abbreviated as RF-WKNNs) is discussed, and then the structure of multiple-layer deep model is illustrated. The prediction and data mining results based on the proposed methods are shown in Section 4. The summary of the data analysis and conclusions are in Section 5.

2. Experimental The paper selects the outdoor atmospheric corrosion samples of low-alloy steels (abbreviated as LAS) as experimental datasets. LAS, itself, has a good corrosion resistance due to the addition of alloying elements (Cu, P, Cr, Ni, etc.) [39,41]. The collected data is obtained from 6 different atmospheric test sites in China. For all the sites, there are 17 kinds of LAS being tested over 16 years. A total of 409 LAS corrosion samples are collected as experimental datasets. In the datasets, each sample has the following dimensions, e.g. material, environment, exposure time and corresponding corrosion rate. (1) The difference between 17 LAS is represented by the composition of 8 chemical elements (Mn, S, P, Si, Cr, Cu, Ni, and Fe), and Table 1 shows the detailed element compositions of 17 LAS.

Table 2 Environmental factors and their statistical values. Factors

Average

Minimum

Maximum

Average Relative Humidity, RH (s%) Average Temperature,T (◦ C) Rainfall (mm/month) SO2 concentration, SO2 (mg/cm3 ) pH of rain (pH) Chloride concentration, Cl− (mg/cm3 )

75.09 17.58 159.74 0.09 6.14 0.22

56.17 11.08 45.64 0.02 5.11 0

87.71 26.05 753.00 0.30 6.97 1.97

(2) The atmospheric test sites cover the typical climate situations in China. In the paper, we select 6 environmental factors and they are average relative humidity, average temperature, rainfall, SO2 concentration, pH of rain, and chloride concentration. These factors are selected because of two main reasons. One reason is that many relevant works prove that these 6 factors have a critical impact on corrosion. For example, a critical relative humidity value can decide the corrosion phenomenon happens or not [40], and SO2 , along with chloride could accelerate the corrosion rate while keeping other factors stable [41,42]. The other reason is that the consideration of the data-completeness has to ignore the missing data of other dimensions. The environmental factors information is shown in Table 2. (3) For each steel, we take the 1 st, 2nd, 4th, 8th and 16th year as sampling time period and collect the corrosion rate at each period. Taking Beijing as an example, we visualize the corrosion rates of 17 LAS vs exposure time in Fig. 1. As for modelling corrosion relationship, we take material components, environmental factors, exposure time as model input and take the corrosion rates as model output. 3. Methods The framework of the proposed method is shown in this section. The method has two parts to combine with. The first part introduces the random forests-weighted k-nearest neighbors (abbreviated as RF-WKNNs) which is a modification of traditional RF model and can improve the generalization performance of RF for new samples. The second part gives the framework of proposed deep-structure methods based on RF-WKNNs and cForest [38], considering the advantages of RF-WKNNs and layer-by-layer representation of cForest. In addition, three evaluation criteria are adopted to evaluate the performance of the proposed model.

204

Y. Zhi et al. / Journal of Materials Science & Technology 49 (2020) 202–210 ∼

each sample is 1/N. In order to generate N (is equal to N generally) samples to train a CART model, we will randomly select one sam∼

Fig. 1. Corrosion rates of LAS versus exposure time in Beijing.

3.1. Random forests-weighted K-Nearest neighbors (RF-WKNNs) Random forests (RF) is an ensemble of multiple tree models. For a given sample, each tree can output a prediction value, and RF model outputs the aggregation of all the tree outputs, as the final prediction results. In general, the aggregate is just the average computation, resulting in the same weight on each tree. However, some tree models may fail in prediction due to the random selection samples or variables in training phase. Therefore, we expect to assign different weights for different tree models after training phase, and give the large weights for high accuracy trees and small weights for low accuracy trees, the weighted aggregate of all the trees can obtain a better performance than traditional RF model. Hence, we propose an improved method called random forest-weighted K nearest neighbors (RF-WKNNs) which can assign dynamic weights for different tree models. The framework of RF-WKNNs is shown in Fig. 2. According to Fig. 2, the whole process of RF-WKNNs includes two parts: training phase (a.k.a fitting phase) and testing phase (a.k.a generalization phase). The former phase is to learn the model parameters by training samples, the latter phase is used to evaluate the performance of the model. As illustrated in Fig. 2, the training phase is to build T tree models of RF. In this paper, we use the classification and regression tree (abbreviated as CART) as the tree model. CART splits the parent node into two children nodes once. The example of a trained CART model structure is shown in Fig. 2(a). Assuming that the input is a two-dimension variable: X1 and X2 . If X1 ≤0, then the prediction rate is Rate1 ; if X1 >0 and X2 ≤1, then Rate2 ; if X1 >0 and 10 and X2 ≥3, then Rate4 . The detailed process of training and testing phase of RF-WKNNs will be shown in the next section. 3.1.1. Training phase The training phase of RF-WKNNs is the same as that of RF model, which uses a training dataset to generate several trees with different model structures and parameters. To ensure the diversity of tree models, RF merges two important operations: bagging and random selection of variables. datasets  training   is D =  From Fig. 2, the original p (x1 , y1 ) , (x2 , y2 ) , . . ., (xN , yN ) , xi = xi1 , xi2 , . . ., xi . It contains N training samples, and each sample consists of p variables. Based on the datasets D, we perform the bagging and random selection of variables operation. Firstly, the bagging (bootstrap aggregation) operation is used to construct sub-datasets for each tree model by randomly sampling with replacement from the original training datasets D. For example, there are N samples in D, and the selection probability of

ple from D and repeat the selection with N times. Then, there will be part of samples being selected with several times and part of samples not being selected ever. According to the statistical calculation, there are about 37 % of all the samples not being selected, and these samples are called out-of-bag (abbreviated as OOB) [23,32]. The OOB samples are just used to calculate the variable importance, and the specific algorithm refers to the work [32,36]. By bagging operation, the training samples are made different for each tree model, and the variance of the final ensemble model structure is reduced [37]. The second operation is the randomly selection of variables, i.e. only a random subset of all the p variables is considered at each node of the tree. This operation gives weak features a chance to be considered in the tree [38]. Based on the above two operations, we can obtain T different CART models with one single datasets D, as shown in Fig. 2(b). 3.1.2. Testing phase In Fig. 2(c), for a given testing sample denoted by C, the testing phase of RF-WKNNs is to implement the weight combination of prediction values of all CARTs. The testing phase includes several steps as follows: Step 1. Calculate the importance value of each input variable by of input the trained RF model [32,36], and  obtain the importance p variables W = w1 , w2 , . . ., wp , 0 < wi < 1, w = 1, where i i−1 wi represent the importance of i-th variable. Step 2. Compute the performance of each trained CART model on each training sample, and a performance matrix S is obtained:





...

s1t

s22 .. .

⎧ ⎪ ⎥ ⎨ . . . s2t ⎥ 1, if , s = ⎥ nt .. .. ⎪ ⎦ ⎩ . .

sn2

...

snt

s11

s12

⎢s ⎢ 21 S=⎢ . ⎣ .. sn1

  ˆ   Y nt − Yn   Y  ≤ threshold n  

0, others (1)

where, Yn represents the real value of n-th training sample, and Yˆ nt represents the prediction value of t-th CART model on n-th training sample. There is a new parameter in Eq. (1), called threshold, which controls the sparse degree of the performance matrix S. When the threshold is set to be near to 1, most of elements in matrix S tends to be 1, and it means that almost every CART model can meet the requirement performance, leading to the weighted combination trivial. When the threshold is set to be near to 0, most of elements in S are tend to be 0, and it means that most of CART models can’t meet the performance requirement, leading to the weighted combination is too sparse to have a robust final model. Step 3. Based on variables importance W, the WKNNs is conthe similar equation between sample A = structed. Therefore,   a1 , a2 , . . ., ap and B = b1 , b2 , . . ., bp is follows: similar (A, B) =

 p

1

w (a − bi ) i−1 i i

(2) 2

The smaller the weighted mean square error between A and B, the higher the similarity between samples. Step 4. For a given testing sample C, calculate the similar value between C and each training samples. Then, select K nearest training samples. Step 5. Based on the training samples and performance matrix S in Step 2, calculate the weighted value of CART models wc =

Y. Zhi et al. / Journal of Materials Science & Technology 49 (2020) 202–210

205

Fig. 2. Framework of RF-WKNNs. (a) Example of CART model; (b) Training phase of RF-WKNNs; (c) Testing phase of RF-WKNNs.





w1c , w2c , . . ., wTc . For example, assuming that the K in KNNs is 3, the selection of K-nearest training samples for new sample is Nos.3, 7 and 8. Then, wiC =(s3i + s7i + s8i )/3, i = 1, 2, . . .,T. Step 6. Input C into all CART  model, obtain the prediction

value vector Cˆ = Cˆ 1 , Cˆ 2 , . . ., Cˆ T . These prediction values will be combined with weights and then the final prediction value Yˆ C is provided:

T Yˆ c =

t−1

T

wtC .Cˆ t

t−1

wtC

(3)

3.2. Densely connected cascade forests-WKNNs (DCCF-WKNNs) Recently, deep learning is applied and studied in many fields because of its superior performances. The learning process and structure in deep learning make itself applicable to solve the challenges which cannot be solved before. Among various deepstructure models, a kind of deep model called gcforest is proposed in 2017 [38]. It is based on RF modelling and representative learning to improve the capability of understanding audio and video information. Unlike most of deep learning structures, the gcforest can be used for the small sample dataset with just a few hyper-parameters to adjust. Based on the idea of gcforest, we propose a new deepstructure model called DCCF-WKNNs, which can be applied to the field of corrosion dataset. The framework of the proposed method is shown in Fig. 3. As illustrated in Fig. 3, the DCCF-WKNNs is a layer-by-layer representation method. Each layer is composed of four RF-WKNNs

Fig. 3. Framework of proposed DCCF-WKNNs model.

model which is shown in Section 3.1, and each layer will transport the output to all the latter layers as the input variables. In order to confirm the final layer of DCCF-WKNNs, we need to split a few samples from training samples as validation datasets and input the selected validation datasets into the RF-WKNNs model of the current layer. If the result is better than the result of the previous layer, it means that the performance of the current layer is better than the previous layer, and the training procedure continues to generate next layer. Otherwise, the performance of current layer is worse than the previous layer, and the training procedure is better to stop generation of the next layer.

206

Y. Zhi et al. / Journal of Materials Science & Technology 49 (2020) 202–210

From Fig. 4, we know that when 4-layer is fixed, the MAPE is satisfactory and reaches the lowest position for validation datasets. It shows 4-layer structure can have the best generalization performance for the datasets. Therefore, in our paper, we determine to establish a 4-layer DCCF-WKNNs model. 4.2. Comparison results of several methods We compare the corrosion rate prediction results with several methods, such as ANN, SVR, traditional RF, RF-WKNNs and cForest [38]. For good performances of each algorithms, we set a series of hyper-parameters before modelling. The specific parameter values are as follows:

Fig. 4. Choice of number of layers using MAPE.

3.3. Evaluation criteria Three criteria are used to evaluate the training and testing results of model, which are as follows: (1) Mean absolute percentage error (MAPE)

 

 

1 N  yˆ n − yn  MAPE =  y  × 100% N n−1  n 

(4)

where N is the number of samples, yˆ n and yn represent the prediction and the true value of n-th sample. The smaller MAPE, the better prediction of the model. (2) Root mean square error (RMSE):

 RMSE =

2 1 N (ˆyn − yn ) N n−1

(5)

The smaller RMSE, the better prediction of the models. (3) Determination coefficients (R2 ):

N R2 = 1 −

2 (ˆy − yn ) n−1 n N 2 (y − yn ) n−1



where, y =

N

y /N n−1 n

(6)

is the mean of the true values of all samples.

With R2 approaching to 1, the model capability becomes better. 4. Results and discussion

(1) For ANN algorithm, we set the architecture of model is: 15-1001 (15 input variables, 100 hidden neurons, 1 output neuron). Furthermore, the transfer functions which reside in hidden and output layer are ReLu function. The stochastic gradient descent method is used to the parameter learning and tuning. The learning rate is set to 0.001, and model training stops when the iteration reaches 300 times. (2) For SVR algorithm, we set the kernel function to be radial basis function (RBF), and the corresponding width is 1. The tolerance error is 300. According to the practical experiments, the smaller the values of width and tolerance error are, the poorer the training and testing results. While the values of width and tolerance error become large, the training error is low but the testing error is very large. The above values are decided by many experiments in our paper. (3) For RF algorithm, we set the number of CART in RF to be 1000. The minimum sample number of leaves in CART is 3, which means that when the training samples in the node of CART is equal or less than 3, the split is stopped at this node and the node is treated as leaf node of CART. (4) For RF-WKNNs algorithm, there are two additional parameters based on RF model. The first parameter is the threshold in Eq. (1), and when the parameter is set too small, most of elements in performance matrix S are 0, while when the value is too large, most of elements in S are 1. Both are not expected when making the model. Here, we set the value of threshold is 0.3, based on several experiments. The second parameter is the K of KNNs algorithm, and we set the K value to be 2. (5) For cForest algorithm, it is the combination of several layers, and each layer contains several RF models. Based on the literature [38], we set the RF number in each layer to be 4, and the parameters in each RF have been mentioned above. (6) For the proposed algorithm, it is the combination of cForest and RF-WKNNs. The values of threshold, K are equal to the RF-WKNNs mentioned above. The RF-WKNNs number in each layer, CART numbers in each RF-WKNNs, and the minimum samples number of leaves in CART are set as the same in the cForest model.

4.1. Determination of layer number As mentioned above in Section 3.2, the proposed DCCF-WKNNs belongs to a kind of multi-layer representative model, and the number of layers needs to be determined by training datasets and validation datasets. Therefore, we randomly split the whole datasets into training, validation and testing with the proportion to be 4:1:2. We observe the MAPE of training, validation and test datasets and decide the number of layers. The main consideration is that when the model is deep enough, the layers cannot give more information to the subsequent layers but increase the complexity adversely. Therefore, A proper deep structure is preferred. Fig. 4 is the MAPE values with different layers on the collected corrosion datasets.

The following Table 3 shows the criteria results for those methods, including the fitting results using training dataset and generalization results using test dataset. From Table 3, we see that compared with traditional RF model, the proposed RF-WKNNs can obtain better performances in testing samples. It can be seen that the combination of weighted trees can improve the generalization ability of RF model. The cForest, the multiple-layer version of RF model, can achieve the sub-optimal results in testing samples. And the best generalization results are given by the proposed DCCF-WKNNs which adopts the advantages in both the RF-WKNNs and cForest. Furthermore, we respectively test the different stages of corrosion rate prediction, that is to split the data by the time dimension.

Y. Zhi et al. / Journal of Materials Science & Technology 49 (2020) 202–210

207

Table 3 Comparison of different methods for fitting and generalization. Methods ANN SVR RF RF-WKNNs cForest DCGF-WKNNs

Fitting Results (Training samples)

Generalization Results (Testing samples)

MAPE (%)

RMSE

R2

MAPE (%)

RMSE

R2

0.28 3.66 6.02 6.02 0.78 0.89

0.08 3.33 2.32 2.32 0.43 0.44

1.000 0.960 0.980 0.980 1.000 1.000

28.16 25.16 16.21 15.31 15.22 12.95

8.35 7.45 5.46 5.36 5.09 4.95

0.785 0.823 0.908 0.911 0.920 0.924

period time. RF-WKNNs almost always perform better than traditional RF model except for the 1 st year. The proposed DCCF-WKNNs almost keeps the best results along with time dimension, and the obvious superiority shows up with increasing time. It means DCCFWKNNs model is able to model the data in a more sound way and give stable and confident results. 4.3. Corrosion knowledge mining results In this part, the proposed model measures the quantitative effects of 6 environmental factors. The aim of measuring is to discover the thresholds by which the corrosion rate varies sharply. For example, different temperature ranges cause different corrosion degree and these facts often show up obvious step phenomena, in other words, there exist thresholds making corrosion rate change greatly. Traditional methods are hardly to find those thresholds. While using our proposed DCCF-WKNNs, it provides a way to find. DCCF-WKNNs fixes part of input variables and lets one input variable change within a certain range. Afterwards, we observe the output of the model to determine the threshold. In practice, we fix the material to be ‘D36’, the location to be ‘Beijing’ and the time to be ‘1 year’. Then, when the values of six environmental factors are sequentially changed, the model is used to predict the corresponding corrosion rate. Fig. 6 shows the thresholds circumstance for 6 environmental factors. According to Fig. 6, we can infer qualitative and quantitative effects on the environmental factors to corrosion:

Fig. 5. Performances of different time periods of 6 methods: (a) RMSE results of 6 methods in sub-datasets with different exposure time; (b) R2 results of 6 methods in sub-datasets with different exposure time; (c) MAPE results of 6 methods in subdatasets with different exposure time.

As for datasets in this paper, in order to test the performance of 6 models in corrosion samples with 1-year exposure time, we use other samples as training samples to train the models, and the trained models are applied to forecast the corrosion rate of samples of which the exposure time is 1 year. In this way, we can give the models for each time period of corrosion situation based on 6 methods. Fig. 5 is the results. Fig. 5 shows the results of three evaluation criteria of 6 methods in each period. The Fig. 5(a) shows the RMSE which can be used to show the absolute error of real-value and predict-value. Fig. 5(b) shows the R2 criterion which can be used to show the fitting precision of model. Fig. 5(c) shows the MAPE criterion which is applied to demonstrate the relative error of real-value and predict-value. Among these three criteria, better performance of the model corresponds to smaller RMSE, MAPE value and larger R2 value, and vice versa. According to the analysis of three sub-graph of Fig. 5(a), we can infer that ANN has a slightly better results than the proposed DCCF-WKNNs in 1 st year, but has much worse results in other

(1) As we know, when pH value decreases, the corrosion process will accelerate, due to the thin electrolyte film with lower pH could promote the dissolution of metal, destroy absorbed protective corrosion production layer [43,44]. Besides it, previous studies have shown that when pH value is less than a critical threshold, the corrosion of materials will be in an acidic environment, and the corresponding corrosion chemical reaction will be more intense than the reaction in a neutral environment, resulting in a faster corrosion process and corrosion rate [45]. However, due to the complexity of the corrosion process, it is often difficult to determine the pH threshold. With the help of the method in this paper, not only the qualitative analysis can be made, but also the corresponding quantitative results can be calculated. Just in Fig. 6(a), the pH threshold of D36 in Beijing area is about 6.3. (2) Fig. 6(b) shows the predicted change curve of corrosion rate of D36 in Beijing with single temperature value changing. As for the influence of temperature on corrosion, experts believe that temperature can improve the chemical reaction process. On the other hand, the temperature will also increase the evaporation process of electrolyte film on the material surface, reduce the existence time of electrolyte film so as to reduce the corrosion rate. At present, some corrosion test results show the following phenomenon [46]. When the temperature is less than a critical threshold range, the increase of temperature will increase

208

Y. Zhi et al. / Journal of Materials Science & Technology 49 (2020) 202–210

Fig. 6. Prediction corrosion rate curve with changing single environmental variable: Effects of pH thresholds (a), temperature thresholds (b), RH thresholds (c), SO2 thresholds (d), rainfall thresholds (e) and Cl− thresholds (f).

Y. Zhi et al. / Journal of Materials Science & Technology 49 (2020) 202–210

(3)

(4)

(5)

(6)

the corrosion rate of the material. When the temperature is higher than the threshold, the corrosion rate will be reduced. This phenomenon is consistent with the curve in Fig. 6(b), and the temperature threshold of D36 in Beijing is about 21◦ C. As we all know, the RH has an important effect on atmospheric corrosion, as well as oxygen. The RH can form electrolyte film which is necessary for atmospheric corrosion to occur. The previous studies generally showed that a thick electrolyte film could hinder the oxygen transport, lowering the cathodic corrosion rates; however, when the electrolyte film was too thin, it can hardly form an effective and continuous electrolyte for corrosion reactions [47]. Therefore, there are two critical RH values. One of them can be applied to determine whether the electrolyte film is continuous or not, and the other is used to decide whether the film is saturated or not. Based on Fig. 6(c), it can be seen that for the corrosion of D36 in Beijing, the paper gives two critical RH values are 67 % and 86 %. SO2 can improve the ion concentration in the electrolyte film, and catalyze the corrosion process [41,42]. Therefore, the higher SO2 corresponds to the higher corrosion rate, which is consistent with the curve trend in Fig. 6(d). In addition, the paper gives a threshold of SO2 around 0.067 mg/cm3 , which can lead to the sudden change of corrosion rate. Similar to the effect of temperature on corrosion, rainfall raise can increase the RH and the electrolyte film on the surface of metal, and make the corrosion rate increase. On the other hand, excessive rainfall will wash away contaminants of metal surface frequently, which could decrease the corrosion rate [41,48]. The Fig. 6(e) give a threshold of rainfall is around 320 mm/month, and when rainfall value is over that threshold, the corrosion rate decreases sharply. From Fig. 6(f), it can be seen that Cl− plays a role in promoting corrosion, which is consistent with the common knowledge of corrosion field [41,42]. The paper further shows that corrosion rate goes up when Cl− value crosses the 0.8 mg/cm3 and 1.9 mg/cm3 , respectively.

Depending on the data-modelling and mining, the paper attempts to give the effects of 6 environmental factors to outdoor atmospheric corrosion, while difficult to be obtained by corrosion experiment before. The results in Fig. 6 provide a corrosion trend of material ‘D36’ in ‘Beijing’ when a single environmental value is changed. Besides, the results also give a threshold for each environmental factor, which has a great important effect to the corrosion process. It can provide a new thought for the corrosion research field. 5. Conclusions A new deep-structure model called DCCF-WKNNs is proposed in this paper to implement the corrosion modelling and knowledge mining. The main conclusions are in the following: (1) To the task of corrosion modelling and prediction of various materials in a variety of outdoor atmospheric environments, the paper proposes a new deep model based on random forests model. Compared with the prediction results of several machine-learning algorithms on the collected low-alloy steels corrosion samples, the proposed method can obtain the best generalization performance. (2) With the help of the proposed method, the paper draws a series of prediction corrosion rate curves when single environmental variable value changes. Based on the curves, we discover the thresholds of each variable, upon which the corrosion rates vary sharply.

209

(3) The development of the proposed DCCF-WKNNs on corrosion datasets provides a useful tool for the modelling and knowledge-mining of LAS corrosion in outdoor atmospheric. Limited by the lack of quantitative of LAS microstructure in the collected datasets, the paper has not discussed the effects of chemical composition and microstructure on the corrosion in details. The related work will be further studied in the near future. Acknowledgements This work was financially supported by the National Key R&D Program of China (No. 2017YFB0702100) and the National Natural Science Foundation of China (No. 51871024). The authors also thank the China Corrosion and Protection Gateway for the data support. References [1] M. Cao, L. Liu, Z. Yu, L. Fan, Y. Li, F. Wang, J. Mater, Sci. Technol. 35 (2019) 651–659. [2] Z.Y. Liu, X.G. Li, C.W. Du, L. Lu, Y.R. Zhang, Y.F. Cheng, Corros. Sci. 51 (2009) 895–900. [3] X.G. Li, D.W. Zhang, Z.Y. Liu, Z. Li, C.W. Du, C.F. Dong, Nature 527 (2015) 441–442. [4] G. Schmitt, World Corrosion Organization, New York, 2009. [5] J. Shi, J. Wang, D.D. Macdonald, Corros. Sci. 89 (2014) 69–80. [6] D. Mareci, G.D. Suditu, R. Chelariu, L.C. Trinc˘a, S. Curteanu, Mater. Corros. 67 (2016) 1213–1219. [7] M. Kamrunnahar, M. Urquidi-Macdonald, Corros. Sci. 52 (2010) 669–677. [8] M. Kamrunnahar, M. Urquidi-Macdonald, Corros. Sci. 53 (2011) 961–967. [9] L. Sadowski, M. Nikoo, Neural Comput. Appl. 25 (2014) 1627–1638. [10] X. Xia, J. Nie, C. Davies, W. Tang, S. Xu, N. Birbilis, Mater. Des. 90 (2016) 1034–1043. [11] A.Z. Shirazi, Z. Mohammadi, Neural Comput. Appl. 28 (2017) 3455–3464. [12] M.J. Jiménez–Come, I. Turias, F.J. Trujillo, Mater. Des. 56 (2014) 642–648. [13] M.J. Jiménez–Come, I.J. Turias, J.J. Ruiz-Aguilar, F.J. Trujillo, J. Chemometr. 28 (2014) 181–191. [14] M.J. Jiménez–Come, I.J. Turias, J.J. Ruiz-Aguilar, Mater. Corros. 66 (2015) 915–924. [15] M.J. Jiménez–Come, I.J. Turias, V. Matres, J. Chemometr 31 (2017) e2936. [16] Y.M. Panchenko, A. Marshakow, Corros. Sci. 109 (2016) 217–229. [17] E. Possan, J.J.D.O. Andrade, Mater. Res-Ibero-Am. J. 17 (2014) 593–602. [18] M. Anoop, K.B. Rao, Sadhana-Acad. P. Eng. S. 41 (2016) 887–899. [19] C.I. Ossai, B. Boswell, I. Davies, Eng. Fail. Anal. 60 (2016) 209–228. [20] A. Brenna, F. Bolzoni, L. Lazzari, M. Ormellese, Mater. Corros. 69 (2018) 348–357. [21] Y.N. Shi, D.M. Fu, X.Y. Zhou, T. Yang, Y.J. Zhi, Z.B. Pei, D.W. Zhang, L.Z. Shao, Corros. Sci. 133 (2018) 443–450. [22] L. Breiman, Mach. Learn. 45 (2001) 5–32. [23] Z.H. Zhou, Ensemble Methods Foundations and Algorithms, Chapman and Hall/CRC, Boca Raton, 2012. [24] M. Fernández-Delgado, E. Cernadas, S. Barro, D. Amorim, J. Mach, Learn. Res. 15 (2014) 3133–3181. [25] R. Genuer, J.M. Poggi, C. Tuleau-Malot, Pattern Recogn. Lett. 31 (2010) 2225–2236. [26] E. Oh, R. Liu, A. Nel, K.B. Gemill, M. Bilal, Y. Cohen, I.L. Medintz, Nat. Nanotechnol. 11 (2016) 479–493. [27] B. Yang, J.M. Cao, D.P. Jiang, J.D. Lv, Multimed. Tools Appl. 77 (2018) 20477–20499. [28] O. Rahmati, H.R. Pourghasemi, A.M. Melesse, Catena 137 (2016) 360–372. [29] D. Quintana, Y. Sáez, P. Isasi, Appl. Sci. 7 (2017) 636–651. [30] E. Yuk, S. Park, C.S. Park, J.G. Baek, Appl. Sci. 8 (2018) 932–945. [31] S. Park, J. Im, E. Jang, J. Rhee, Agric. For. Meteorol. 216 (2016) 157–169. [32] Y. Hou, C. Aldrich, K. Lepkova, L. Machuca, B. Kinsella, Electrochim. Acta 256 (2017) 337–347. [33] Y. Hou, C. Aldrich, K. Lepkova, B. Kinsella, Electrochim. Acta 274 (2018) 160–169. [34] I. Naladala, A. Raju, C. Aishwarya, G.K. Shashidhar, Proceedings of IEEE International Conference on Advances in Computing, Communications and Informatics, Bangalore, India, 2018, September 19-22. [35] D.E. Brown, J.T. Burns, JOM 70 (2018) 1168–1174. [36] Y.J. Zhi, D.M. Fu, D.W. Zhang, T. Yang, X.G. Li, Metals 9 (2019) 383. [37] N. Morizet, N. Godin, J. Tang, E. Maillet, M. Fregonese, B. Mech, Mech. Syst. Signal Proc. 70-71 (2016) 1026–1037. [38] Z.H. Zhou, J. Feng, Proceeding of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 2017, August 19-25. [39] B. Liu, X. Mu, Y. Yang, L. Hao, X. Ding, J. Dong, Z. Zhang, H. Hou, W. Ke, J. Mater, Sci. Technol. 35 (2019) 1228–1239. [40] F. Mansfeld, Mater. Corros. 30 (1979) 38–42.

210

Y. Zhi et al. / Journal of Materials Science & Technology 49 (2020) 202–210

[41] M. Morcillo, B. Chico, I. Díaz, H. Cano, D. de la Fuente, Corros. Sci. 77 (2013) 6–24. [42] Z.F. Wang, J.R. Liu, L.X. Wu, R.D. Han, Y.Q. Sun, Corros. Sci. 67 (2013) 1–10. [43] Y.B. Hu, C.F. Dong, M. Sun, K. Xiao, P. Zhong, X.G. Li, Corros. Sci. 53 (2011) 4159–4165. [44] D. Wicke, T.A. Cochrane, A.D. O’Sullivan, S. Cave, M. Derksen, Water Sci. Technol. 69 (2014) 2166–2173.

[45] Y.F. Wang, G.X. Cheng, W. Wu, Q. Qiao, Y. Li, X.F. Li, Appl. Surf. Sci. 349 (2015) 746–756. ˜ [46] A.A. Castaneda, F. Corvo, D. Fernández, C. Valdés, Eng. J. 21 (2017) 43–62. [47] S.C. Chung, A.S. Lin, J.R. Chang, H.C. Shih, Corros. Sci. 42 (2000) 1599–1610. [48] F. Corvo, J. Minotas, J. Delgado, C. Arroyave, Corros. Sci. 47 (2005) 883–892. [49] Y.Q. Zhu, J.S. Ou, G. Chen, H.P. Yu, Neural Comput. Appl. 20 (2011) 309–317. [50] Y. Liu, Z.Q. Ge, J. Process. Control 64 (2018) 62–70.