Available online at www.sciencedirect.com Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2019) 000–000 Procedia Computer Science 00 (2019) 000–000 Procedia Computer Science 159 (2019) 1449–1458
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
23rd International Conference on Knowledge-Based and Intelligent Information & Engineering 23rd International Conference on Knowledge-Based Systems and Intelligent Information & Engineering Systems
A A Method Method of of Data Data Augmentation Augmentation for for Classifying Classifying Road Road Damage Damage Considering Influence on Classification Accuracy Considering Influence on Classification Accuracy Haruki Tsuchiyaaa, Shinji Fukuibb, Yuji Iwahoria* Hayashiaa, Witsarut a*, Yoshitsugu a c Haruki Tsuchiya , Shinji Fukui , Yuji Iwahori , Yoshitsugu Hayashi , Witsarut Achariyaviriya a , Boonserm Kijsirikulc Achariyaviriya , Boonserm Kijsirikul a Chubu University, Kasugai, 487-8501 Japan
University, Kasugai, 487-8501 JapanJapan b Aichia Chubu University of Education, Kariya, 448-8542 b Aichi University of Education, Kariya, 448-8542 Japan c Chulalongkorn University, Bangkok 20330, Thailand University, Bangkok 20330, Thailand
c Chulalongkorn
Abstract Abstract This paper proposes a method for augmenting learning data of road damage dataset considering the influence of the augmented This proposesaccuracy. a methodData for augmenting learning data of road task damage dataset the influence of the augmented data on paper classification augmentation is a very important in the field considering of machine learning because more learning data on classification Dataofaugmentation a very important the field learning because more learning data causes increasingaccuracy. the accuracy classificationisaccuracy in general.task Theinquality of of themachine augmented data influences the accuracy data increasingEffective the accuracy classificationmethod accuracy general. The quality of the augmented data influences the accuracy of thecauses classification. data of augmentation forinincreasing classification accuracy is needed. The proposed method of the classification. Effective data augmentation method for increasing classification is road needed. The proposed method generates learning data by selecting effective data augmentation methods depending on accuracy the class of damage. The method uses generates learning data by selecting effective data augmentation methods depending on the class of road damage. The method You Only Look Once v3 (YOLOv3) for detection and classification of road damage in an image. It is tuned by data addinguses the You Look by Once (YOLOv3) for detection and classification road damage an image. It is tuned byresults data adding the data Only augmented thev3 proposed method to the road damage dataset of presented to the in public. The experimental show that data augmented by the proposed method to the road damage dataset presented to the public. The experimental results show that the proposed method can increase the accuracy efficiently and effectively. The proposed selection of data augmentation methods the proposed method mean can increase accuracy(mAP) efficiently effectively. The proposed improves remarkably Averagethe Precision whichand is one of the accuracy indices.selection of data augmentation methods improves remarkably mean Average Precision (mAP) which is one of the accuracy indices. c 2019 ⃝ 2019 The The Authors. Author(s). Published ElsevierB.V. B.V. © Published byby Elsevier c 2019 ⃝ The Author(s). Published Elsevier B.V. This is an open access article underbythe CC BY-NC-ND BY-NC-ND license license (https://creativecommons.org/licenses/by-nc-nd/4.0/) (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND Peer-review under under responsibility responsibility of of KES KES International. International. license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of KES International. Keywords: Data Augmentation; Road Damaged Classification; Deep Learning; YOLOv3 Keywords: Data Augmentation; Road Damaged Classification; Deep Learning; YOLOv3
1. Introduction 1. Introduction The authors are focusing on road traffic conditions in Thailand to raise Quality of Life (QoL) of people in SATREPS The authors are jam focusing traffic conditions in the Thailand to raise Quality of (QoL) of reducing people initSATREPS project [1]. Traffic is oneon ofroad the biggest problems of road traffic conditions in Life Thailand and is needed project [1]. Traffic jam is one of the biggest problems of the road traffic conditions in Thailand and reducing it is needed to raise QoL. One of the causes of the traffic jam is traffic accidents. Road damage may cause traffic accidents and to raise QoL. One of the causes of the traffic jam is traffic accidents. Road damage may cause traffic accidents and ∗ ∗
Yuji Iwahori. Tel.: +81-568-51-9378 ; fax: +81-568-51-1540. Yuji Iwahori. Tel.: +81-568-51-9378 ; fax: +81-568-51-1540. E-mail address:
[email protected] E-mail address:
[email protected]
c 2019 The Author(s). Published by Elsevier B.V. 1877-0509 ⃝ c 2019 1877-0509 ⃝ The Author(s). Published by Elsevierlicense B.V. (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access under the CC BY-NC-ND 1877-0509 © 2019 Thearticle Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of KES International. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under of KES International. Peer-review underresponsibility responsibility of KES International. 10.1016/j.procs.2019.09.315
Haruki Tsuchiya et al. / Procedia Computer Science 159 (2019) 1449–1458 Tsuchiya et al. / Procedia Computer Science 00 (2019) 000–000
1450 2
should be repaired immediately. A method for detection and classification of road damage from a road image using a computer is needed because it causes rapid repair. A method for detecting the road damages and classifying the damage level has been proposed [2]. Increasing accuracy of detection and classification remained as future work. In general, using more learning data causes obtaining better results in the field of deep learning. Methods of data augmentation have been proposed. Effective data augmentation method is needed. This paper proposes a method for data augmentation. The proposed method augments learning data for road damage classification using the dataset [2] and improves accuracy of the classification. Data augmentation methods are selected in each class independently to improve classification accuracy because effective methods depend on classes. The effectiveness of the proposed method is demonstrated through the experimental results. 2. Proposed Method Outline of the proposed method is as follows: Step 1. Augment road damage data by various data augmentation methods Step 2. Confirm effect of each data augmentation method Step 3. Create optimal dataset for classification of road damage First, proposed method augments road damage data by various methods of data augmentation. Next, the data augmented by each data augmentation method are evaluated. The classification result of each model is obtained, whose model is trained with data added by the data augmentation by each augmentation method to the original data. Finally, the optimal dataset is obtained using augmented data by some methods selected based on the results of Step 2. The following subsections show the original dataset of road damage and the Steps in detail. 2.1. Dataset used for Road Damage Classification The proposed method uses the dataset presented to the public in Ref.[2] as the original dataset. This dataset contains 9053 road images (7240 images for training and 1813 for validation) taken in seven areas of Japan, image coordinates of bounding boxes and corresponding class names. The types of road damage and the corresponding class names treated in Ref.[2] are shown in Table 1. The number of data of each class is shown in Table 2. Some examples of road images with annotation data are shown in Fig.1. Table 1. Definition of Road Damages in Dataset
Damage Type
Crack
Linear Crack
Longitudinal Lateral
Alligator Crack Other Corruption
Detail Wheel mark part Construction joint part Equal interval Construction joint part Partial pavement, overall pavement Rutting, bump, pothole, separation White line blur Cross walk blur
Class Name D00 D01 D10 D11 D20 D40 D43 D44
Table 2. Number of Data of Each Class in Training Data of Dataset [2]
Class Name Number of Data
D00 2768
D01 3789
D10 742
D11 636
D20 2541
D40 409
D43 817
D44 3733
Proposed method uses You Only Look Once v3 (YOLOv3) [3] for detection and classification of road damage. YOLOv3 is a network using Convolutional Neural Network (CNN) that performs detection and classification of
Haruki Tsuchiya et al. / Procedia Computer Science 159 (2019) 1449–1458 Tsuchiya et al. / Procedia Computer Science 00 (2019) 000–000
1451 3
Fig. 1. Examples of Damaged Road Images
objects simultaneously by treating the image recognition task as a regression problem. It can perform fast and with high accuracy. Image coordinates of bounding boxes which are surrounding damages and corresponding annotation data are required to apply YOLOv3 to detection and classification of road damage. They are obtained by the original dataset. 2.2. Data Augmentation by Various Data Augmentation Methods At Step 1, the proposed method augments the training data by various data augmentation method. The method uses fifteen data augmentation methods. The detail of each data augmentation method is shown in Table 3. Through this process, sixteen datasets (DS1: the original dataset, DS2: the dataset augmented by No. 1, · · ·, DS16: the dataset augmented by No. 15) are obtained. A python library, called Imgaug [4], is used for the data augmentation. The results by Imgaug are used for generating annotation data processed so that YOLOv3 can use them as training data. The example image generated by each data augmentation method is shown in Fig. 2. 2.3. Confirmation of Effective Data Augmentation Method At Step 2, the accuracy of detection and classification of each class is validated using each dataset. YOLOv3 which is trained with ImageNet [5] datasets is fine-tuned using each dataset. Through this step, sixteen YOLOv3s are constructed. All networks is tuned with the same hyper-parameters. The hyper-parameters of YOLOv3 for all dataset are shown in Table 4. After fine-tuning YOLOv3s, the accuracy of each YOLOv3 is calculated using the validation data. After training YOLOv3s up to 30000 iterations, mean Average Precision (mAP) of each dataset is calculated every 3000 iterations using Average Precision (AP) of each class. YOLOv3s are evaluated by AP at the iteration of the highest mAP. 2.4. Creation of Optimal Dataset In general, the accuracy of classification increases when the number of training data increases. On the other hand, too many data which are augmented by one or more data augmentation methods may cause decreasing the accuracy.
Haruki Tsuchiya et al. / Procedia Computer Science 159 (2019) 1449–1458 Tsuchiya et al. / Procedia Computer Science 00 (2019) 000–000
1452 4
Table 3. Details of Data Augmentation Method
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Detail of Method Add a random value from −40 to 40 to all pixels Change an image size to 70∼130 pixels and move the image by -30 to 30 pixels on x-axis and y-axis independently Blur an image using the average of a pixel and neighbors Convert 2% of all pixels to black pixels and drop the information Convolve an image with 3x3 kernel Convert p% of all pixels to black pixels (0≦p≦2) and drop the information Convert an image to an edge binary image and merge with the original image Emboss an image and merge with the original image Flip horizontally Blur an image using Gaussian kernel Inverse all pixels Blur an image using the medians of neighborhood pixels with random sizes between 3 × 3 and 11 × 11 Covert each pixel by multiplying a random value between 0.5 and 1.5 Add pepper noises at 5% of all pixels Convert an image into a Superpixels partial representation
No. 1
No. 2
No. 3
No. 4
No. 5
No. 6
No. 7
No. 8
No. 9
No. 10
No. 11
No. 12
No. 13
No. 14
No. 15
Fig. 2. Augmented Image Example of Each Dataset
The proposed method selects the classes of which data are augmented and selects effective data augmentation methods for each class independently. The selection of classes causes improving classification accuracy efficiently. The selection of effective augmentation methods prevents decreasing accuracy and increases it remarkably. The proposed augmentation method is as follows: At first, the classes of which data are augmented are determined. The data of classes of which APs are lower than mAP at the iteration of the highest mAP are augmented. Next, AP at
Haruki Tsuchiya et al. / Procedia Computer Science 159 (2019) 1449–1458 Tsuchiya et al. / Procedia Computer Science 00 (2019) 000–000
1453 5
Table 4. Hyper-Parameter of YOLOv3
Batch Size
Momentum
Load Damping
32
0.9
0.0005
Learning Rate at Iteration 1-100 101-25000 25001-30000 0.0001 0.001 0.0001
the iteration with the highest mAP in the result with each dataset is confirmed in each selected class. Some datasets with higher APs are selected. The augmentation methods by which the datasets are augmented are determined as the selected data augmentation methods. After that, the data are augmented. The images including the road damage of each class are selected randomly and are applied the data augmentation method selected randomly from the selected methods. The number of augmented data for eahc class is determined to be as close as possible to the number of data of other classes. 3. Experiments Experiments were performed to show the effectiveness of the proposed method. The data of road damage images are augmented according to the proposed method. The accuracy of YOLOv3 which is trained by the augmented data is evaluated. At the process of Step 1, the number of data in each class increased by about 20% by each data augmentation method. The number of data of each class after data augmentation is shown in Table 5. Table 5. Number of Data of Each Class in Training Data after Data Augmentation
Class Name Number of Data
D00 3323
D01 4546
D10 886
D11 767
D20 3054
D40 494
D43 978
D44 4505
At the process of Step 2, the accuracy of YOLOv3s, which were fine-tuned using DS1,· · ·, DS16, was evaluated. The number of true positives (TP), that of false positives (FP), that of false negatives (FN), precision, and average intersection over union (Avg.IoU) at the iteration of the highest mAP were obtained. They are shown in Table 6. The table shows that the accuracy of the YOLOv3s is as almost the same as that training with DS1. Table 6. Result of Accuracy of YOLOv3 trained using Each Dataset
Dataset Name DS1 DS2 DS3 DS4 DS5 DS6 DS7 DS8 DS9 DS10 DS11 DS12 DS13 DS14 DS15 DS16
Highest mAP 0.5347 0.5579 0.5437 0.5724 0.5537 0.5369 0.5520 0.5455 0.5618 0.5334 0.5639 0.5659 0.5599 0.5745 0.5691 0.5443
Iteration 27000 24000 27000 24000 27000 27000 15000 24000 21000 15000 21000 24000 27000 15000 30000 18000
TP 1991 1978 1997 2026 1998 1945 1984 1941 1865 1742 1999 1981 1947 1891 1994 1983
FP 1308 1140 1350 1301 1183 1203 1239 1087 865 811 1161 1061 1103 984 1026 1259
FN 1002 1015 996 967 995 1048 1009 1052 1128 1251 994 1012 1046 1102 999 1010
Recall 0.67 0.66 0.67 0.68 0.67 0.65 0.66 0.65 0.62 0.58 0.67 0.66 0.65 0.63 0.67 0.66
Precision 0.60 0.63 0.60 0.61 0.63 0.62 0.62 0.64 0.68 0.68 0.63 0.65 0.64 0.66 0.66 0.61
Avg.IoU 0.4402 0.4609 0.4317 0.4457 0.4537 0.4511 0.4463 0.4647 0.5030 0.4939 0.4653 0.4796 0.4593 0.4816 0.4888 0.4445
The accuracy of each class is confirmed to find the cause that the accuracy of the YOLOv3s increases not so much. Fig. 3 shows graphs of APs every 3000 iterations of each dataset. APs of D00, D10, D11, and D40 were lower than
Haruki Tsuchiya et al. / Procedia Computer Science 159 (2019) 1449–1458 Tsuchiya et al. / Procedia Computer Science 00 (2019) 000–000
1454 6
mAP at almost iteration. mAP can be improved if the accuracy of them was improved. It was determined that the data of D00, D10, D11, and D40 were augmented more. Fig. 4 shows that APs of D00, D10, D11 and D40 at the iteration of the highest mAP. The figure shows that effective methods depend on the class and selection of data augmentation methods may be effective. Five methods, which were used for data augmentation when YOLOv3s with top five APs were trained, were selected independently for each class in the experiments. It was determined that the methods No. 1, No. 3, No. 10, No. 11, No. 13 were used for D00, that the methods No. 1, No. 3, No. 8, No. 10, No. 13 were used for D10, that the methods No. 1, No. 3, No. 8, No. 13, No. 14 were used for D11, and that the methods No. 3, No. 4, No. 9, No. 11, No. 14 were used for D40. In each class, the images including road damage of the class were selected randomly and are applied the data augmentation method selected from five methods. New dataset (DS17) were generated by adding the augmented data to DS1. Total number of images of DS17 was 16475 with 23967 labels. The number of data of each class are shown in Table 7. Examples of augmented images for the classes D00, D10, D11 and D40 are shown in Fig. 5. Table 7. Number of Data of Each Class in Training Data after Adding Data Augmented by Five Methods
Class Name Number of Data
D00 5531
D01 3789
D10 2968
D11 2544
D20 2541
D40 2045
D43 817
D44 3733
After fine-tuning YOLOv3 with the same hyper-parameter shown in Table 4 using DS17 as the training data (YOLO-DS17), it is evaluated using the validation data of the dataset [2]. Examples of experimental results are shown in Fig. 6. The results by YOLOv3 fine-tuned using DS1 (YOLO-DS1) are also shown in the figure for comparison. It is confirmed that YOLO-DS17 can detect and classify more correctly than YOLO-DS1. YOLO-DS17 detects the road damages which YOLO-DS1 cannot detect and classifies correctly damages which YOLO-DS1 fails in classification. Next, the accuracy of the classification of YOLO-DS17 and YOLO-DS1 was evaluated. APs and mAP of every 3000 iterations of YOLO-DS17 and YOLO-DS1 are shown in Table 8 and 9. The tables show that the whole APs of YOLO-DS17 are better than those of YOLO-DS1. PR curves of YOLO-DS17 and YOLO-DS1 are shown in Fig. 7. The figure also shows that YOLO-DS17 can obtain better results than YOLO-DS1. Table 8. APs of Each Class and mAPs of YOLOv3 Fine-tuned with DS17
Table 9. APs of Each Class and mAPs of YOLOv3 Fine-tuned with DS1
Haruki Tsuchiya et al. / Procedia Computer Science 159 (2019) 1449–1458 Tsuchiya et al. / Procedia Computer Science 00 (2019) 000–000
7
DS2
DS3
DS4
DS5
DS6
DS7
DS8
DS9
DS10
DS11
DS12
DS13
DS14
DS15
DS16
Fig. 3. APs Every 3000 Iterations
1455
Haruki Tsuchiya et al. / Procedia Computer Science 159 (2019) 1449–1458
1456 8
Tsuchiya et al. / Procedia Computer Science 00 (2019) 000–000
Fig. 4. APs of Datasets D00, D10, D11 and D40 at Iteration of Highest mAP
The number of TP, that of FP, that of FN, precision, and Avg.IoU at the iteration of the highest mAP of YOLOv3s are shown in Table 10. It can be confirmed that the accuracy of YOLO-DS17 is better than that of YOLO-DS1 wholely. The highest mAP of YOLO-DS17 is 27.41% higher than that of YOLO-DS1. These results show the effectiveness of the proposed data augmentation method. Table 10. Accuracy Evaluation
Dataset 1 17
mAP 0.5347 0.8088
Iteration 27000 30000
TP 1991 2277
FP 1308 654
FN 1002 716
Recall 0.67 0.76
Precision 0.60 0.78
Avg.IoU 0.4402 0.5707
4. Conclusion This paper proposed a method for data augmentation of a road damage dataset considering the influence of augmented data on the accuracy of detection and classification. The effective data augmentation methods depend on the class. Five augmentation methods are selected for each class independently. The optimal learning dataset is generated by applying effective data augmentation methods. YOLOv3 trained using the generated dataset can obtain better results than that using the original dataset. The quantitative evaluation of two YOLOv3s shows that YOLOv3 trained using the dataset generated by the proposed method can obtain 27.41% higher mAP than that trained using the original dataset. In addition, IoU, Recall and Precision are also improved. The experimental results show that model which detects and classifies road damage with high accuracy can be constructed by the proposed method.
Haruki Tsuchiya et al. / Procedia Computer Science 159 (2019) 1449–1458
1457
Tsuchiya et al. / Procedia Computer Science 00 (2019) 000–000
D00
D10
D11
D40
9
Fig. 5. Example Augmented Images
D00
D10
D11
D40
Results by YOLOv3 fine-tuned using DS17
Results by YOLOv3 fine-tuned using DS1 Fig. 6. Results
Future subjects remain including comparative experiments using other object detection networks such as SSD [6] and M2Det [7] and measures against road damage specific to Thailand for the SATREPS project.
1458
Haruki Tsuchiya et al. / Procedia Computer Science 159 (2019) 1449–1458
10
Tsuchiya et al. / Procedia Computer Science 00 (2019) 000–000
DS17
DS1 Fig. 7. PR Curves
Acknowledgment This research is supported by SATREPS Project of JST and JICA: ”Smart Transport Strategy for Thailand 4.0 Realizing better quality of life and low-carbon society”, by Japan Society for the Promotion of Science (JSPS) Grantin-Aid for Scientific Research (C) (17K00252) and by Chubu University Grant. References [1] Yoshitsugu Hayashi et al. “Smart transport strategy for thailand 4.0 realizing better quality of life and low-carbon society”, Jst-jica project, Chubu University, 2017. [2] Hiroya Maeda, Yoshihide Sekimoto, Toshikazu Seto, Takehiro Kashiyama, and Hiroshi Omata. “road damage detection using deep neural networks with images captured through a smartphone”, 2018. [3] Joseph Redmon and Ali Farhadi. “yolov3: An incremental improvement”. arXiv preprint arXiv:1804.02767, 2018. [4] Alexander B. Jung. Imgaug, https://github.com/aleju/imgaug [5] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei. “ImageNet: A Large-Scale Hierarchical Image Database”, In CVPR09, 2009. [6] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu and Alexander C. Berg. “SSD: Single Shot MultiBox Detector”, in ECCV2016, pages 21–37, 2016. [7] Qijie Zhao, Tao Sheng, Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai and Haibing Lin. “M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network”, The Thirty-Third AAAI Conference on Artificial Intelligence,AAAI, 2019.