A New Ensemble Approach based on Deep Convolutional Neural Networks for Steel Surface Defect classification

A New Ensemble Approach based on Deep Convolutional Neural Networks for Steel Surface Defect classification

Available online at www.sciencedirect.com ScienceDirect Available online atonline www.sciencedirect.com Available at www.sciencedirect.com ScienceD...

682KB Sizes 9 Downloads 117 Views

Available online at www.sciencedirect.com

ScienceDirect

Available online atonline www.sciencedirect.com Available at www.sciencedirect.com

ScienceDirect ScienceDirect

Procedia CIRP 00 (2018) 000–000

www.elsevier.com/locate/procedia

ProcediaProcedia CIRP 00CIRP (2017) 72000–000 (2018) 1069–1072 www.elsevier.com/locate/procedia

51st CIRP Conference on Manufacturing Systems

A New Ensemble28th Approach based on Deep Convolutional CIRP Design Conference, May 2018, Nantes, France Neural Networks for Steel Surface Defect classification A new methodology to analyze the functional and physical architecture of a a a , Yiping Gao , Liang Gao , Xinyufamily Lia,* existing productsWen for Chen an assembly oriented product identification a

State Key Laboratory of Digital Manufacturing Equipment & Technology, Huazhong University of Scienceand Technology, Wuhan, 430074, China

Paul Stief *, Jean-Yves Dantan, Alain Etienne, Ali Siadat

* Corresponding author. Tel.: +86-027-87559419 ; fax: +86-027-87559419. E-mail address: [email protected] École Nationale Supérieure d’Arts et Métiers, Arts et Métiers ParisTech, LCFC EA 4495, 4 Rue Augustin Fresnel, Metz 57078, France

* Corresponding author. Tel.: +33 3 87 37 54 30; E-mail address: [email protected]

Abstract

Steel surface defect recognition is a crucial component of automated steel surface inspection system which influences the quality of steel greatly. Abstract To improve the accuracy rate, an ensemble approach that integrating different deep convolutional neural networks (DCNNs) is proposed in this paper. Firstly, three different DCNNs are trained respectively with data augmentation to reduce over-fitting. Various optimization methods and In today’s business environment, the trend towards more product variety and customization is unbroken. Due to this development, the need of tricks are used to reduce the error in the training procedure. Secondly, three well-trained models are combined. The experimental results show agile and reconfigurable production systems emerged to cope with various products and product families. To design and optimize production that the proposed approach made a state-of-art performance on accuracy rate and robustness in steel surface defect classification. systems as well as to choose the optimal product matches, product analysis methods are needed. Indeed, most of the known methods aim to analyze a product or one product family on the physical level. Different product families, however, may differ largely in terms of the number and © 2018 Authors. Published Elsevier nature ofThe components. This fact by impedes anB.V. efficient comparison and choice of appropriate product family combinations for the production Peer-review under responsibility of the scientific committee of the 51st Conference on Manufacturing system. A new methodology is proposed to analyze existing products in CIRP view of their functional and physicalSystems. architecture. The aim is to cluster these products in new assembly oriented product families for the optimization of existing assembly lines and the creation of future reconfigurable Keywords: Deep Convoltional Neural Network; Ensemble Learning; Steel Surface Defect Recognition; Data Augmentation assembly systems. Based on Datum Flow Chain, the physical structure of the products is analyzed. Functional subassemblies are identified, and a functional analysis is performed. Moreover, a hybrid functional and physical architecture graph (HyFPAG) is the output which depicts the similarity between product families by providing design support to both, production system planners and product designers. An illustrative example of a nail-clipper is used to explain the proposed methodology. An industrial case study on two product families of steering columns of 1. Introduction artificial neural network [8]. Although many approaches had thyssenkrupp Presta France is then carried out to give a first industrial evaluation of the proposed approach. proposed for steel surface inspection, most of them required a © 2017 The Authors. Published by Elsevier B.V. Quality control is a big challenge for steel production, which complete prior knowledge Peer-review under responsibility of the scientific committee of the 28th CIRP Design Conference 2018. for feature extraction. The defect

causes many economic loss and affluences the quality of steel is the most common defect occurred in steel production. The statistical data shows that more than 90% defects are occurred on the surface of the steel, which impact not only the appearance, but also the 1.physical Introduction properties such as abrasion resistance. In the early years, surface inspection is almost operated Due toandthe fast development in the domainof the of manually the inspected area only covered the 0.05% communication and an ongoing trend of digitization and total production [1]. To deal with this weakness, the visiondigitalization, manufacturing enterprises aremanner facing important based automatic surface inspection (ASI) became a challenges in today’s market environments: a typical continuing succedaneum for human-inspection gradually. A ASI tendency towards reduction of product development times and manner for steel surface inspection contains two components: shortened product lifecycles. addition, there is an increasing feature extraction and defectInclassification. Feature extraction demand of customization, being at the same time a global is to make complete representations of the defect in images like competition with competitors all over the world. This trend, histogram [2], local binary pattern [3], [14], SIFT [4] and which inducing the learning development from macro to micro waveletis [5]. Machine is a common technology in markets, results in diminished lot sizes due toNeighbour augmenting defect classification phase, including K-Nearest [6], product (high-volume to low-volume production) supportvarieties vector machine [5], learning vector quantization [7][1]. and To cope with this augmenting variety as well as to be able to identify possible optimization potentials in the existing 2212-8271 ©system, 2018 The it Authors. Publishedtobyhave Elsevier B.V. knowledge production is important a precise greatly. Surface amongFamily the other defects, Keywords: Assembly;defect, Design method; identification

recognition rate is highly relying on how complete the feature represented. Once without an explicit guidance, the accuracy rate would decline. Therefore, how to improve the accuracy rate without prior knowledge is still a challenging problem in ASI. Over these years, deep convolutional neural networks of the product anda characteristics manufactured (DCNNs) haverange led to series of breakthroughs for and/or image assembled in this system. In this context, the main challenge in classification [9], [10], [11], [12] and industry application [13], modelling and analysis is now not only to cope with single which integrated the low/mid/high-level features and products, a limited range product families, classification and product classifiers in oranexisting end-to-end multi-layer but also to be able to analyze and to compare products to fashion. Despite DCNNs provides a powerful manner for define image new product families. It can no be observed that classical recognition and required prior knowledge for existing feature product families are regrouped function of clients features. extraction, the accuracy rate is in highly relying on theor large-scale However, assembly oriented product families are hardly to find. samples. In steel production, the defect dataset is too small to On the product family level, products differ mainly in two optimize a DCNN and collecting large-scale samples is difficult main characteristics: (i)current the number of components (ii) the and costly. In addition, practices suggest thatand combining type of components (e.g. mechanical, electrical, electronical). the outputs of different models is further ingredients necessary considering single products forClassical achievingmethodologies better accuracy rate, but inmainly ASI, this technique has or solitary, already existing product families analyze the not been implemented nor tested yet. product structure on a physical level (components level) which causes difficulties regarding an efficient definition and comparison of different product families. Addressing this

Peer-review under responsibility of the scientific committee of the 51st CIRP Conference on Manufacturing Systems.

2212-8271©©2017 2018The The Authors. Published by Elsevier 2212-8271 Authors. Published by Elsevier B.V. B.V. Peer-review under responsibility of scientific the scientific committee theCIRP 51stDesign CIRP Conference Conference2018. on Manufacturing Systems. Peer-review under responsibility of the committee of the of 28th 10.1016/j.procir.2018.03.264

1070 2

Wen Chen et al. / Procedia CIRP 72 (2018) 1069–1072 Author name / Procedia CIRP 00 (2018) 000–000

To overcome these problems and improve the accuracy rate, an ensemble approach with data augmentation is proposed in this paper. Different from the single-model based approaches, the proposed approach combines several DCNN models and uses data augmentation technique to improve the variety of defect dataset. In this approach, three DCNN including ResNet32, WRN-28-10 and WRN-28-20 are trained individually. After that, these well-trained models are combined together with an average strategy for defect classification. The experimental results demonstrate that the proposed approach has made a state-of-art performance on both recognition rate and robustness in steel surface defect recognition problem. The organization of the remainder is as follows. Section 2 presents the detail of the proposed ensemble approach. In the section 3, the evaluation of the proposed approach based on NEU dataset is given. In the last section, the conclusion and future work of this paper is presented. 2. The proposed ensemble approach for steel surface defect recognition In this section, the details of the ensemble DCNN models are described. 2.1. Data Augmentation and Pre-Processing In order to reduce the risk of over-fitting as well as increase the variety of defect dataset. Label-preserving transformation is used to augment the defect dataset. In this data augmentation, ten little patches are extracted from the raw defect images, including four corner patches, center crop patches and their horizontal reflection. The size of each patch is a hyperparameter, which depended on the size of raw images. For data pre-processing, all the patches are resized to 32*32 and standardized by the normal distribution in order to keep the standard of input and save computation.

1) Residual Network. In conventional DCNN, the output of the l-th layer is connected as input to the (l+1)-th layer, which propagates to the following layer transited by 𝑥𝑥𝑙𝑙 = 𝐻𝐻𝑙𝑙 (𝑥𝑥(𝑙𝑙−1) ) . In the Residual Network (ResNet), the architecture is stacked by the residual blocks shown in Fig. 1 a). Each Residual block is composed by convolutional layers, batch-normalization layers and activation layers. In addition, a skip connection is added with an identity function 𝑥𝑥𝑙𝑙 = 𝐻𝐻𝑙𝑙 (𝑥𝑥(𝑙𝑙−1) ) + 𝑥𝑥(𝑙𝑙−1) . An advantage of ResNet is the gradient flows directly through the identity function from later layers to the earlier layers. With this operation, gradient vanish is avoided, which greatly alleviates the optimization difficulty and pushes the depth of deep neural networks to hundreds of layers. ResNet has achieved impressive performances on many challenging image recognition tasks and won the 1st place on the ILSVRC 2015 classification task. ResNet-n donates a residual network has a total number of convolutional layers n. 2) Wide Residual Network Wide residual network (WRN) [12] is an improvement of ResNet, which increase width of Residual block. In WRN, the Wide Residual block is widening with a factors k. Thin and deep residual networks are against the nature of GPU computations because of their sequential structure, while increasing the width helps effectively balance computations. Experimental results in [12] demonstrated this modified blocks provides an effective way of improving performance of residual networks. In addition, a dropout layer is added into the Wide Residual block. WRN-n-k denotes a WRN with n convolutional layers and a widening factors k. The architecture of Wide Residual block is shown in Fig. 1 b).

2.2. The Varieties of DCNN A typical DCNN consists of three major components: convolutional layer, pooling layer and classification layer. The convolutional layers apply a set of filters to extract the feature map of the input. The pooling layer operates as the downsampling to reduce the dimension and avoid over-fitting. At the end of DCNN, a classification layer is connected to compute the recognition of defects. In the proposed approach, two kinds of DCNN models are adopted, including Residual Network and Wide Residual Network. The details of those varieties would be discussed below. Considering a defect image 𝑥𝑥0 passed through a DCNN, which comprises 𝐿𝐿 layers. A non-linear transformation 𝐻𝐻𝑙𝑙 (∙) is implemented in each layer, which can be a composite of operations such as Batch Normalization (BN) [15], activation layer [16], pooling [17], dropout [18] and convolution. And 𝑥𝑥 𝑙𝑙 denotes the output of layer l.

Fig. 1. The architecture of the block in Residual Network and Wide Residual Network. Left: the block for Residual Network. Right: the block for Wide Residual Network

2.3. The ensemble DCNNs approach for steel surface defect recognition In the proposed approach, the defect images are augmented and processed as described in section 2.1. And then, three different DCNN models, ResNet-32, WRN-28-10 and WRN28-20 are trained individual with different initialization



Wen Chen et al. / Procedia CIRP 72 (2018) 1069–1072 Author name / Procedia CIRP 00 (2018) 000–000

parameter. The defect images are divided into the training set, validation set and test set. Training set is to optimize the DCNN model and validation set is to select the better model to integrate. There are two main strategies for combining models. One is the on-line strategy that concatenates the feature vectors from the different models and trains them together. Another one is the off-line strategy, which is to compute the average of the outputs from the individual models. Since DCNNs containing millions of parameters, it is difficult and costly to implement the on-line strategy to update models. Therefore, the off-line ensemble strategy is adopted in this approach, which combines the outputs of several models by averaging their softmax class posteriors. Furthermore, the early experiments also demonstrated that the averaging accuracy rate outperformed the accuracy rates from either individual model. The remaining details of the proposed approach are described as followed.

1071 3

validation set, which is used as a guideline for hyper-parameter choosing and model selection. Three varieties of DCNN, ResNet-32, WRN-28-10 and WRN-28-20 are built in this approach. The architecture and the other detail are shown in Table 1.

1) Activation layer In order to transform the defect images into a non-linear space, ReLU [16], defined by 𝑓𝑓(𝑥𝑥) = 𝑚𝑚𝑚𝑚𝑚𝑚(0, 𝑥𝑥), is adopted as the activation layer, which applies the rectified linear unit element-wise to the input. It has been proven that ReLU makes a sparse representation and can accelerate the convergence. Fig. 2 The examples of defect images in NEU

2) Pooling layer Pooling layer combines spatially nearby features in the feature maps. This combination makes deep neural networks more compact and invariant to small image changes, such as insignificant details. Pooling layer also decreases the computational load of the next stages. In this approach, average-pooling is adopted. 3) Regularization To reduce over-fitting and accelerating the training speed, Dropout [18] and Batch Normalization [15] are used in these individual models. Batch normalization can also accelerate deep networks training by reducing internal covariate shift. In the WRN-28-10 and WRN-28-20, a dropout layer is added into each residual block between convolutions and after ReLU layers to perturb Batch Normalization layer in the next block. 3. Experimental results In this section, the proposed approach is verified on a public dataset Northeastern University (NEU) for evaluation. NEU dataset [14] contains six different defects of hot-rolled steel strip surface, including crazing (Cr), inclusion (In), patches (Pa), pitted surface (PS), rolled-in scale (RS) and scratches (Sc). For each defect, it contains 300 200*200 grayscale images and examples of NEU dataset are given in Fig. 2. It is obvious that the inter-class divergence is small, but the intra-class divergence is various. In this experiment, the dataset is divided into three sets, training set, validation set and test set. The training set contains 135 images of each defect of model optimizing and test set contains 150 images of each defect to evaluate the recognition rate. The rest of images comprise the

In the training set, five patches of each defect images are extracted with data augmentation described in section 2.1, including four corner patches and a center crop patch with size 180*180. After that, these patches are transformed horizontal reflection. Finally, all the samples are resized to 32*32 little patches and normalized. Each DCNN is trained 5 times with different initial parameters individually, which obey a standard normal distribution. The optimal models of each DCNN are selected by validation set. In the setting of the DCNNs, Stochastic Gradient Descent with Nesterov’s Accelerated Momentum and Cross-entropy are used as the optimizer and loss function. The learning rate is set to 0.1 initially, and lowered by a factor of 10 after epoch 60 and epoch 120. The weight decay is set to be 0.00005 without dampening and the momentum is 0.9. All the models are trained for 200 epochs with a mini-batch size of 8. Table 1. The architectures for each DCNN models. The residual block and wide residual block are shown in this table. Down sampling is connected in front of Conv3 and Conv4 with a stride of 2. Layer name

Output size

Conv2

32*32

Conv3

16*16

Conv4

8*8

AvgPooling

1*1

Conv1

32*32

ResNet-32 3 ∗ 3, [ 3 ∗ 3, 3 ∗ 3, [ 3 ∗ 3, 3 ∗ 3, [ 3 ∗ 3,

16 ]∗5 16 32 ]∗5 32 64 ]∗5 64

WRN-28-10

WRN-28-20

(3*3, 16), stride 1, padding 1 3 ∗ 3, 3 ∗ 3, 3 ∗ 3, [ 3 ∗ 3, 3 ∗ 3, [ 3 ∗ 3, [

16 ∗ 10 ]∗4 16 ∗ 10 32 ∗ 10 ]∗4 32 ∗ 10 64 ∗ 10 ]∗4 64 ∗ 10

8*8

3 ∗ 3, 3 ∗ 3, 3 ∗ 3, [ 3 ∗ 3, 3 ∗ 3, [ 3 ∗ 3, [

16 ∗ 20 ]∗4 16 ∗ 20 32 ∗ 20 ]∗4 32 ∗ 20 64 ∗ 20 ]∗4 64 ∗ 20

The implementation is based on Torch [19] in a workstation and accelerating by GPU. After individual training, the best well-trained model of each deep neural network is selected according to the accuracy rate on the validation set. The

Wen Chen et al. / Procedia CIRP 72 (2018) 1069–1072 Author name / Procedia CIRP 00 (2018) 000–000

1072 4

ensemble approach integrated three best well-trained models by the average strategy. The results of this experiment are shown in Table 2. As can be seen from the results in Table 2. The proposed approach achieved 99.889% accuracy rate for the NEU dataset, which make a state-of-art performance compared with the existing approach 99.27% and 98.61% proposed in [20]. In addition, ResNet-32, WRN-28-10 and WRN-28-20 achieved the average accuracy rate of 99.067%, 99.689% and 99.7225% individually. Since test set contains 900 samples, only one sample is falsely classified. This result reveals that the three models learned the complementary features such that one tends to correct the other ones’ mistakes. It also demonstrated that the ensemble strategy improving the recognition rate and robustness from the single models. Table 2. The results of recognition rate and std on NEU dataset

Approach GLCM+MLR Decaf+MLR Resnet-32 WRN-10 WRN-20 The proposed approach

Recognition Rate 98.61% 99.27% 99.067% 99.689% 99.7225% 99.889%

Std 0.27 0.21 0.1665 0

4. Conclusion and future work In this paper, an ensemble approach is proposed for steel surface defect recognition. The proposed approach trained three different DCNN models individually and used an average strategy to combine the output of them to improve the recognition rate. The results of the experiment demonstrated that the proposed approach has made a state-of-art performance for steel surface inspection problem. Both recognition rate and robust have improved from the existing work. The limitations of the proposed approach include the following aspects. Firstly, although the defect images are resized, the computation is still too large for real applications. For example, a well-trained WRN-28-10 model is more than 1.2GB and required 2days for training even accelerating by GPU, which cause it difficult to load and finetune in workshop. Secondly, defect images are in ideal condition, while noise and low-quality are ignored. Therefore, the feature work will focus on two directions. One is to reduce the computation cost under the premise of the recognition rate. Another one is to denoise the low-quality images from the workshop and make a robust approach for the defect classification.

51775216 and 51711530038, and in part by the 111 project under Grant B16019. References [1] [2] [3] [4]

[5] [6] [7]

[8]

[9] [10] [11] [12] [13]

[14] [15] [16] [17] [18]

Acknowledgements This work was supported in part by the Natural Science Foundation of China (NSFC) under Grants 51435009,

[19] [20]

Neogi N, Mohanta D K, Dutta P K. Review of vision-based steel surface inspection systems. EURASIP Journal on Image and Video Processing, 2014;1:50-69. Kim C W, Koivo A J. Hierarchical classification of surface defects on dusty wood boards. Pattern Recognition Letters,1994;15(7):713-721. Niskanen M, Silvén O, Kauppinen H. Color and texture based wood inspection with non-supervised clustering. In Proceedings of the scandinavian Conference on image analysis, 2001:336-342. Weimer D, Scholz-Reiter B, Shpitalni M. Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Annals-Manufacturing Technology, 2016;65(1):417-420. Jeon Y J, Choi D, Lee S J, et al. Defect detection for corner cracks in steel billets using a wavelet reconstruction method. JOSA A, 2014;31(2):227-237. Dupont F, Odet C, Cartont M. Optimization of the recognition of defects in flat steel products with the cost matrices theory. NDT & E International, 1997;30(1): 3-10. Caleb P, Steuer M. Classification of surface defects on hot rolled steel using adaptive learning methods. In Proceedings of the Fourth IEEE Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies, 2000;1:103-108. Wu G, Zhang H, Sun X, et al. A bran-new feature extraction method and its application to surface defect recognition of hot rolled strips. 2007 IEEE International Conference on Automation and Logistics, 2007; 2069-2074. Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 2012:1097-1105. Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition[J]. arXiv preprint arXiv, 2014;1409-1556. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016:770-778. Zagoruyko S, Komodakis N. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016. L. Wen, X. Y. Li, L. Gao and Y. Y. Zhang, “A New Convolutional Neural Network Based Data-Driven Fault Diagnosis Method,” IEEE Transactions on Industrial Electronics, vol. 65, no. 7, pp. 5990-5998, 2018. Song K, Yan Y. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects. Applied Surface Science, 2013;285:858-864. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning, 2015:448-456. Nair V, Hinton G E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning. 2010:807-814. LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. In Proceedings of the IEEE, 1998;86(11):22782324. Srivastava N, Hinton G E, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. Journal of machine learning research, 2014;15(1):1929-1958. Collobert R, Kavukcuoglu K, Farabet C. Torch7: A matlab-like environment for machine learning. BigLearn, NIPS Workshop. 2011. Ren R, Hung T, Tan K C. A Generic Deep-Learning-Based Approach for Automated Surface Inspection. IEEE Transactions on Cybernetics, 2017.