Apple Tree Trunk and Branch Segmentation for Automatic Trellis Training Using Convolutional Neural Network Based Semantic Segmentation

Apple Tree Trunk and Branch Segmentation for Automatic Trellis Training Using Convolutional Neural Network Based Semantic Segmentation

Proceedings, 6th IFAC Conference on Bio-Robotics Proceedings, 6th IFAC Conference on Bio-Robotics Proceedings, 6th IFAC Conference onAvailable Bio-Rob...

972KB Sizes 0 Downloads 49 Views

Proceedings, 6th IFAC Conference on Bio-Robotics Proceedings, 6th IFAC Conference on Bio-Robotics Proceedings, 6th IFAC Conference onAvailable Bio-Robotics online at www.sciencedirect.com Beijing, China, July 13-15, 2018 Beijing, China, China,6th July 13-15, 2018 Proceedings, 6th IFAC Conference on Bio-Robotics Bio-Robotics Beijing, July 13-15, 2018 Proceedings, IFAC Conference on Beijing, China, July 13-15, 2018 Beijing, China, July 13-15, 2018

ScienceDirect

IFAC PapersOnLine 51-17 (2018) 75–80

Apple Apple Tree Tree Trunk Trunk and and Branch Branch Segmentation Segmentation for for Automatic Automatic Trellis Trellis Training Training Apple Tree Trunk and Branch Segmentation for Automatic Trellis Training Using Convolutional Neural Network Based Semantic Segmentation Apple Tree Trunk and Branch Segmentation for Automatic Trellis Training Using Convolutional Neural Network Based Semantic Segmentation Using Convolutional Neural Network Based Semantic Segmentation Using Convolutional Neural Network Based Semantic Segmentation

Yaqoob Yaqoob Majeed*, Majeed*, Jing Jing Zhang**, Zhang**, Xin Xin Zhang***, Zhang***, Longsheng Longsheng Fu****, Fu****, Manoj Manoj Karkee*****, Karkee*****, Qin Qin Zhang******, Zhang******, Matthew Matthew D. Whiting******* Yaqoob Majeed*, Jing Zhang**, Xin Zhang***, Longsheng Fu****, Manoj Karkee*****, Qin Yaqoob Majeed*, Jing Zhang**, Xin Zhang***, Longsheng Fu****, Manoj Karkee*****, Qin Zhang******, Zhang******, Matthew Matthew D. Whiting*******  D. Whiting*******  D. Whiting******* *Center Washington  *Center for for Precision Precision and and Automated Automated Agricultural Agricultural Systems, Systems, Washington State State University, University, Prosser, Prosser, WA WA 99350 99350 USA (Tel: 509-212-6696; e-mail: [email protected]). *Center Automated Agricultural Washington USA (Tel: 509-212-6696; 509-212-6696; e-mail: [email protected]). [email protected]). *Center for for Precision Precision and andUSA Automated Agricultural Systems, Systems, Washington State State University, University, Prosser, Prosser, WA WA 99350 99350 (Tel: e-mail: ** Agricultural Systems, Washington State USA 509-212-6696; e-mail: [email protected]). ** Center Center for for Precision Precision and and Automated Automated Agricultural Systems, Washington State University; University; College College of of Engineering, Engineering, China China USA (Tel: (Tel: 509-212-6696; e-mail: [email protected]). ** Center for Precision and Automated Agricultural Systems, Washington State University; College of Engineering, China Agricultural University (e-mail:[email protected]). ** Agricultural Systems, Washington Agricultural University (e-mail:[email protected]). ** Center Center for for Precision Precision and and Automated Automated Agricultural Systems, Washington State State University; University; College College of of Engineering, Engineering, China China Agricultural University (e-mail:[email protected]). *** Agricultural Systems, Washington Agricultural University (e-mail:[email protected]). *** Center Center for for Precision Precision and and Automated Automated Agricultural Systems, Washington State State University University (e-mail: (e-mail: [email protected]). [email protected]). Agricultural University (e-mail:[email protected]). *** Center for Precision and Automated Agricultural Systems, Washington State University (e-mail: [email protected]). Center for and Automated Agricultural Systems, Washington State University (e-mail: *** for and Automated Systems, State [email protected]). **** Center Precision and Automated Systems, State University (e-mail: *** Center Center **** for Precision Precision andPrecision Automated Agricultural Systems, Washington Washington State University University (e-mail: [email protected]). **** Center for for Precision andAgricultural Automated Agricultural Agricultural Systems, Washington Washington State(e-mail: University (e-mail: [email protected]). **** Center for Precision and Automated Agricultural Systems, Washington State University [email protected]). **** Center for Precision and Automated Agricultural Systems, Washington State University (e-mail: (e-mail: [email protected]). ***** Center for Precision and Automated Agricultural Systems, Washington State University (e-mail: [email protected]). ***** Agricultural [email protected]). ***** Center Center for for Precision Precision and and Automated Automated Agricultural Systems, Systems, Washington Washington State State University University (e-mail: (e-mail: [email protected]). ***** Agricultural ***** Center Center for for Precision Precision and and Automated Automated Agricultural Systems, Systems, Washington Washington State State University University (e-mail: (e-mail: [email protected]). ****** Systems, [email protected]). [email protected]). ****** Center Center for for Precision Precision and and Automated Automated Agricultural Agricultural Systems, Washington Washington State State University University (e-mail: (e-mail: [email protected]). [email protected]). ******* Center Precision and Systems, State University (e-mail: ****** for Automated Agricultural Systems, State [email protected]). ******* Center for forand Precision and Automated Automated Agricultural Systems, Washington Washington State (e-mail: University (e-mail: ****** Center Center for Precision Precision and Automated AgriculturalAgricultural Systems, Washington Washington State University University (e-mail: [email protected]). ******* Center for Precision and Automated Agricultural Systems, Washington State University (e-mail: [email protected]). ******* Center for Precision and Automated Agricultural Systems, Washington State University [email protected]). ******* Center for Precision and Automated Agricultural Systems, Washington State University (e-mail: (e-mail: [email protected]). [email protected]). [email protected]). Abstract: Abstract: Apple Apple orchard orchard in in modern modern fruiting fruiting wall wall architectures architectures (e.g. (e.g. Vertical Vertical and and V-trellis) V-trellis) help help to to attain attain Abstract: Apple orchard in modern fruiting wall architectures (e.g. Vertical and V-trellis) help to attain high fruit yield and quality. These systems are also key to developing simpler tree canopies, which Abstract: Apple orchard in modern fruiting wall architectures (e.g. Vertical and V-trellis) help attain high fruit fruit Apple yield and and quality. These fruiting systems wall are architectures also key key to to developing developing simpler tree canopies, canopies, which Abstract: orchard in modern (e.g. Vertical and V-trellis) help to to which attain high yield quality. These systems are also simpler tree improves productivity of manual orchard operations while creating opportunities for automated field high fruit yield and quality. These systems are also key to developing simpler tree canopies, which improves productivity of manual orchard operations while creating opportunities for automated field high fruit productivity yield and quality. Theseorchard systems are also while key tocreating developing simpler tree improves of manual operations opportunities for canopies, automatedwhich field operations such as robotic harvesting and/or pruning. Training of fruit trees to these architectures is improves productivity of manual orchard operations while creating opportunities for automated field operations such as robotic harvesting and/or pruning. Training of fruit trees to these architectures is improves productivity of manual orchard operations creating opportunities for automated field operations such as robotic harvesting and/or pruning.while Training of fruit trees to these architectures is carried out manually, which is becoming challenging due to the increasing labor cost and uncertainty in operations as and/or pruning. Training of these is carried manually, which is challenging to labor and uncertainty operations such as robotic robotic harvesting and/or pruning.due Training of fruit fruit trees trees tocost these architectures is carried out out such manually, whichharvesting is becoming becoming challenging due to the the increasing increasing laborto cost andarchitectures uncertainty in in the labor availability. With is reduced costchallenging and increasing speed and robustness of sensing and robotic carried out manually, becoming due the labor cost and in carried outavailability. manually, which which becoming due to to the increasing increasing labor of costsensing and uncertainty uncertainty in the labor With is reduced costchallenging and increasing speed and robustness and robotic technologies, automated tree reduced training cost couldand be increasing a viable alternative. One of the most important steps to the labor With speed robustness of and robotic the labor availability. availability. With speed and andOne robustness of sensing sensing and steps robotic technologies, automated tree reduced training cost couldand be increasing a viable alternative. of the most important to automate tree training operation is to segment out the trunk and branches of the trees that are ready to be technologies, automated tree training could be a viable alternative. One the most important steps to automate tree automated training operation operation is to to segment segment out the trunk trunk and branches branches of the trees that are ready ready to be be technologies, tree training could beout a viable alternative. One of of the the trees mostthat important steps to automate tree training is the and are to trained and and then select operation the suitable suitable branch for the the training. Inand thisbranches work, aa of trunk and branch segmentation automate tree training is segment out the the trees that ready trained then select the branch for training. In this work, trunk and branch segmentation automate tree training operation is to to segment out training. the trunk trunkIn and branches the and treesbranch that are are ready to to be be trained and then select the suitable branch for the this work, a of trunk segmentation method and wasthen developed using Kinect V2 sensor sensor and deep learning-based learning-based semantic segmentation. Kinect trained select suitable branch for training. In trunk branch method was developed using Kinect V2 and deep semantic segmentation. Kinect trained and then select the the suitable branch for the theand training. In this this work, work, aa semantic trunk and andsegmentation. branch segmentation segmentation method was developed using Kinect V2 sensor deep learning-based Kinect was used to acquire point cloud data of the tree canopies in a commercial orchard. Depth and RGB method was developed using Kinect V2 sensor and deep learning-based semantic segmentation. Kinect was used to acquire point cloud data of the tree canopies in a commercial orchard. Depth and RGB method using Kinect andcanopies deep learning-based semantic segmentation. was usedwas to developed acquire point cloud dataV2ofsensor the tree in a commercial orchard. Depth andKinect RGB information extracted from the point cloud data were used to remove the background trees from the RGB was used to acquire point cloud data of the tree canopies in a commercial orchard. Depth and RGB information extracted from the point cloud data were used to remove the background trees from the was used toextracted acquire point cloud data of the canopies a commercial orchard. Depth information from the point cloud datatree were used to in remove the background trees fromand the RGB image. Then trunk and branches of the tree that share the common appearance and features were information from the used remove the the image. Then trunk branches the tree that common appearance and features were information extracted from the point pointof cloud data were used to tothe remove the background background trees from the RGB RGB image. Thenextracted trunk and and branches ofcloud the data tree were that share share the common appearance trees and from features were segmented out using a convolutional neural network (SegNet) for the semantic segmentation. We image. Then branches tree that the features image. Thenouttrunk trunk anda convolutional branches of of the the tree network that share share(SegNet) the common common appearance and features were were segmented usingand neural for theappearance semantic and segmentation. We achieved trunk and branch segmentation accuracy of 0.92 and 0.93 and the mean intersection-over-union segmented out using aa convolutional network for semantic segmentation. segmentedtrunk out and using convolutional neural network (SegNet) for the mean semantic segmentation. We We achieved branch segmentation neural accuracy of 0.92(SegNet) and 0.93 and intersection-over-union (IoU) score score of 0.59 0.59 and 0.44, 0.44, respectively.accuracy The Boundary-F1 Boundary-F1 score, which directly relates the the accuracy accuracy of of achieved trunk and segmentation of 0.93 the mean (IoU) of and respectively. The score, which directly relates achieved trunk and branch branch segmentation accuracy of 0.92 0.92 and andscore, 0.93 and and thedirectly mean intersection-over-union intersection-over-union (IoU) score of 0.59 and 0.44, respectively. The Boundary-F1 which relates the accuracy of the segmented region boundaries, was 0.93 and 0.88 for the trunk and branch, respectively. These (IoU) score of 0.59 and 0.44, respectively. The Boundary-F1 score, which directly relates the accuracy of the segmented segmented region boundaries, was 0.93 0.93 and 0.88 for for score, the trunk trunk anddirectly branch, respectively. These (IoU) score of 0.59 and boundaries, 0.44, respectively. The and Boundary-F1 which relates the accuracy of the region was 0.88 the and branch, respectively. These assessments show the potential potential of the the deep learning-based semantic segmentation forrespectively. automated branch branch the segmented region boundaries, was 0.93 and the and These assessments show the of deep learning-based semantic segmentation for automated the segmented region boundaries, was 0.93 and 0.88 0.88 for forsemantic the trunk trunk and branch, branch,for respectively. These assessments show the potential of the deep learning-based segmentation automated branch detection in in the orchard environment, which provides a foundation forsegmentation developing automated tree training assessments show the of deep semantic for branch detection orchard environment, provides for developing tree assessments show the potential potential of the thewhich deep learning-based learning-based semantic for automated automated branch detection in the the orchard environment, which provides aa foundation foundation forsegmentation developing automated automated tree training training systems. detection in the orchard environment, which provides a foundation for developing automated tree training systems. detection in the orchard environment, which provides a foundation for developing automated tree training systems. systems. systems. © 2018, IFAC (International Automaticagricultural Control) Hosting by Elsevier Ltd.vision, All rights reserved. Keywords: fruit tree branchFederation training, ofautomated systems, machine deep learning, Keywords: Keywords: fruit fruit tree tree branch branch training, training, automated automated agricultural agricultural systems, systems, machine machine vision, vision, deep deep learning, learning, semantic segmentation, branch and trunk segmentation. Keywords: fruit agricultural Keywords: fruit tree tree branch branch training, automated agricultural systems, systems, machine machine vision, vision, deep deep learning, learning, semantic segmentation, branchtraining, and trunkautomated segmentation. semantic semantic segmentation, segmentation, branch branch and and trunk trunk segmentation. segmentation.   branches  branches are are tied tied to to horizontal horizontal trellis trellis wires wires during during various various branches are tied to horizontal trellis wires during various 1. INTRODUCTION 1. INTRODUCTION growth stages of the branches. Currently, this tree training branches are tied to horizontal trellis wires during various growth stages of the branches. Currently, this tree training branches are tied to horizontal trellis wires during various growth stages of the branches. Currently, this tree training 1. 1. INTRODUCTION INTRODUCTION operation is manually by stages of Currently, this training Washington operation is completed completed manually by semi-skilled semi-skilled seasonal growth stages of the the branches. branches. Currently, this tree tree seasonal training operation is completed manually by semi-skilled seasonal state in in the the US US growth Washington is is the the leading leading apple apple producing producing state Washington is the leading apple producing state in the US workers, which is becoming increasingly challenging because operation is completed manually by semi-skilled seasonal with a contribution of more than 60% of total fresh market workers, which is becoming increasingly challenging because Washington is the leading apple producing state in the US operation is completed manually by semi-skilled seasonal which is becoming increasingly challenging because with aa contribution contribution of more moreapple than producing 60% of of total total fresh market Washington is the leading state in the US workers, with of than 60% fresh market of labor and in labor workers, which challenging because (USDA-NASS, 2017). In recent years, there is an an increasing with aa contribution of more than 60% of fresh market which is is becoming becoming increasingly challenging because of the the increasing increasing labor cost costincreasingly and uncertainties uncertainties in the the labor (USDA-NASS, 2017). recent is increasing with contribution of In more thanyears, 60% there of total total fresh market workers, (USDA-NASS, 2017). In recent years, there is an increasing availability. To address these labor-related challenges, apple of the increasing labor cost and uncertainties in the labor trend of training the apple orchards to the modern orchard availability. To address these labor-related challenges, apple (USDA-NASS, 2017). In recent years, there is an increasing of the increasing labor cost and uncertainties in the labor availability. To address these labor-related challenges, apple trend of the orchards the orchard (USDA-NASS, In recent years,to is an increasing trend of training training2017). the apple apple orchards tothere the modern modern orchard (and industry innovations in availability. To address these labor-related challenges, apple architectures (e.g. and fruiting wall (and other other tree tree fruit) industry needs innovations in automated automated trend apple to availability. Tofruit) address these needs labor-related challenges, apple other tree fruit) industry needs innovations in automated architectures (e.g.the Vertical and V-trellis V-trellis fruitingorchard wall (and trend of of training training the Vertical apple orchards orchards to the the modern modern orchard architectures (e.g. Vertical and V-trellis fruiting wall or robotic technologies for tree training, which will also help (and other tree fruit) industry needs innovations in automated orchards) because of higher fruit yield and quality and ease in or robotic technologies for tree training, which will also help architectures (e.g. Vertical and V-trellis fruiting wall (and other tree fruit) industry needs innovations in automated or robotic technologies for tree training, which will also help orchards) because of higher fruit yield and quality and ease in architectures (e.g.of higher Verticalfruitand fruiting wall orchards) because yieldV-trellis and quality and ease in the industry to remain socially and economically sustainable or robotic technologies for tree training, which will also help manual operation, which results in the increase in the industry to remain socially and economically sustainable orchards) because of higher fruit yield and quality and ease or robotic technologies for tree training, which will also help the industry to remain socially and economically sustainable manual operation, which results in the increase orchards) because of higher fruit yield and quality and ease in manual operation, which results in the increase for long Zahniser, 2013). the to remain socially and sustainable profitability (Weber, In for aaaindustry long term term (Hertz and Zahniser, 2013). manual which results in increase in the industry to (Hertz remainand socially and economically economically sustainable long term (Hertz and Zahniser, 2013). manual operation, operation, whichWhiting, results2018). in the the increasethese in for profitability (Weber, 2000, 2000, Whiting, 2018). In addition, addition, these for a long term (Hertz and Zahniser, 2013). There have been a wide research and development architectures create relatively simple tree canopies that are profitability 2000, In for a long termbeen (Hertz Zahniser, 2013). There have been wide research and development development in in architectures(Weber, create relatively relatively simple2018). tree canopies canopies thatthese are There profitability (Weber, 2000, Whiting, Whiting, 2018). In addition, addition, these have aa and wide research and in architectures create simple tree that are automating apple amenable of automated have been wide research in architectures create simple tree canopies that automating various field operations inand thedevelopment apple industry industry amenable to to adoption adoption of mechanized mechanized or automated solutions There have various been aafield wideoperations research in andthe development in architectures create relatively relatively simple or tree canopiessolutions that are are There automating various field operations in the apple industry amenable to adoption of mechanized or automated solutions including harvesting (Ji et 2011; for operations such harvesting and various field in apple industry amenable adoption of or solutions including robotic robotic harvesting (Ji et et al., al., 2012; 2012; Zhao et al., al., 2011; for various variousto field operations such as as robotic robotic harvesting and automating automating various field operations operations in the theZhao apple industry amenable to field adoption of mechanized mechanized or automated automated solutions including robotic harvesting (Ji et al., 2012; Zhao et al., 2011; for various field operations such as robotic harvesting and Bulanon and Kataoka, 2010; Baeten et al., 2008; Silwal et pruning. To create these orchard architectures, targeted including robotic harvesting (Ji et al., 2012; Zhao et al., for various field operations such as robotic harvesting and Bulanon and Kataoka, 2010; Baeten et al., 2008; Silwal et al., al., pruning. Tofield create these orchard orchard architectures, targeted includingand robotic harvesting et al.,et2012; ZhaoSilwal et al., 2011; 2011; for variousTo operations such as robotic harvesting and Bulanon Kataoka, 2010; (Ji Baeten al., 2008; et al., pruning. create these architectures, targeted 2008; Silwal et al., Bulanon and Kataoka, 2010; Baeten et al., pruning. To create these orchard architectures, targeted pruning. To create these orchard architectures, targeted Bulanon and Kataoka, 2010; Baeten et al., 2008; Silwal et al.,

2405-8963 © 2018, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved. Copyright 2018 IFAC 75 Peer review© of International Federation of Automatic Copyright ©under 2018 responsibility IFAC 75 Control. Copyright © 2018 IFAC 75 10.1016/j.ifacol.2018.08.064 Copyright © 2018 IFAC 75 Copyright © 2018 IFAC 75

IFAC BIOROBOTICS 2018 76 Beijing, China, July 13-15, 2018

Yaqoob Majeed et al. / IFAC PapersOnLine 51-17 (2018) 75–80

ready for training into V-trellis fruiting wall architecture, were used. Point cloud data of the trees were captured during the dormant season (January, 2018) using Kinect V2 (Microsoft Corporation, Redmond, WA) sensor and Matlab® (Mathworks, Natick, MA). The sensor was mounted on an image acquisition platform designed for this purpose as shown in Figure 1b. The height and distance of the sensor from the tree were around 1.1 m throughout the image acquisition process. Kinect V2 has the capability to acquire RGB, depth as well as point cloud data using its RGB camera, depth sensor and IR (Infrared Radiation) emitter. As the FOV (field of view) of the RGB camera (84.1×53.8 degree) and depth sensor (70.6×60 degree) is different and there is offset between the two, it is difficult to map the depth information on the RGB images. To overcome this issue, point cloud data was captured with a resolution of 1,920x1,080 pixels in which depth information was already mapped with the RGB information. That is why RGB and depth information from the point cloud data were used in this study. The point cloud data was acquired in sunny (not during noon) and cloudy days as well as in night time using artificial light.

2014; Silwal et al., 2017) to automated pruning (Karkee and Adhikari, 2015; Chattopadhyay et al., 2016; Akbar et al., 2016; Schupp et al., 2017; Elfiky et al., 2015; Huang et al., 2015). These systems use a sensing or machine vision technology to identify and localize the targeted objects. For example, Karkee et al. (2014) reconstructed the 3D skeleton of the tree using the medial axis thinning algorithm for the automated pruning. Wu et al. (2014) used the “stripe programming” to reconstruct 3D structure of the Chinese hickory trees for dynamic analysis of the trees. Similarly, 3D reconstruction of the tree is obtained using the skeleton-based geometric features for the automatic dormant pruning (Elfiky et al., 2015). Similar to these systems studied for automated pruning, the first step in automated tree branch training is to identify or segment the trunk and the branches of the trees that are ready for the training in the orchard environment. However, the above-mentioned methods are generally constrained by the single objects and lack applicability in the orchard environment due to the variable environmental conditions and limited detection speed. Artificial intelligence and deep learning is making dramatic breakthroughs in the image classifications and object detection applications (Makantasis et al., 2015; Kampffmeyer et al., 2016; Ren et al., 2015; Girshick, 2015; Sun et al., 2017; Girshick et al., 2014; Long et al., 2015; Noh et al., 2015). Deep learning networks feature high accuracy and robustness, which are essential for developing machine visions systems for agricultural applications in challenging and uncertain outdoor environment. There are a few reported applications for fruit and tree branch detection (Bargoti and Underwood, 2017; Zhang et al., 2017) using deep learning techniques. Zhang et al. (2017) used depth and index images of tree canopies and a R-CNN (Regions-Convolutional Neural Network) to detect apple tree branches already trained to trellis wires for the use in shake-and-catch harvesting. This technique lacks the applicability to detect tree untrained trunks and branches and to create complete tree structures because of the dynamic structure of trees. Deep learning-based semantic segmentation using the SegNet has the capability to pixel-wise understanding of the scenes based on the appearance, shape, and spatial relationship between object classes by retaining the boundary information (Badrinarayanan et al., 2017). This powerful feature of SegNet-based semantic segmentation gives the edge over other deep learning approaches to segment out the regions which share similar appearance and features (i.e. trunk and branch). Besides, this technique also retains the scene boundary information while segmenting out the regions, which is useful for reducing the post processing of images for real-time applications. The primary objective of this study was to segment out the trunk and branches of apple trees in orchard environment using a Kinect V2 and deep learning based semantic segmentation technique, which will pave the way for developing an automated branch training system.

(a)

2. MATERIALS AND METHODS (b) Fig. 1. a) An example of experimental orchard rows used in this study; and b) An illustration of experimental setup

2.1 Image Acquisition For this study, one-year-old, densely planted apple trees, (Commercial planting, Prosser, WA; Figure 1a), which were 76

IFAC BIOROBOTICS 2018 Beijing, China, July 13-15, 2018

Yaqoob Majeed et al. / IFAC PapersOnLine 51-17 (2018) 75–80

2.2 Image Pre-processing The RGB and depth information were extracted from the point cloud data. Then, unwanted background beyond a depth threshold of 1.3 m was removed. Figure 2a and 2b showed an example of the original image and an image after removing the background, respectively. Only small part of the canopy region (within 1.3 m from the sensor) with desired tree trunk and branches were left after background removal.

77

results. For this, a pre-trained network was used to train the new dataset after fine-tuning. The encoder weights, in this study, were initialized from the pre-trained VGG16 model trained on ImageNet (Simonyan and Zisserman, 2015), which has a great capability for object localisation and classification.

(a) (b) Fig. 2. a) An example RGB image extracted from the point cloud data; and b) Corresponding image after background was removed using the depth information. 2.3 SegNet for Semantic Segmentation SegNet (Badrinarayanan et al., 2017), which has an encoder depth of 5, was used for pixel-level semantic segmentation (Badrinarayanan et al., 2017). It consisted of an encoder and corresponding decoder network followed by the pixel-wise classification layer. The encoder network contained 13 convolutional layers same as the initial 13 convolutional layers of the VGG16 (Simonyan and Zisserman, 2015) that was designed for the classification of the object. To reduce the number of parameters and to retain the high-resolution feature maps in the encoder network, fully connected layers have been discarded for the SegNet. The architecture of the SegNet is given in Figure 3. In the encoder network, each encoder produces a feature map set followed by the batch normalization and then applies the activation function (ReLU). Then, max-pooling is performed followed by sub-sampling of the output by a factor of 2 to achieve translation invariance for robust classification. SegNet also has 13 decoder convolutional layers while each decoder layer uses the memorized max-pooling indices to upsample its input feature maps from its corresponding encoder layer. Then soft-max classification layer is used to produce the probabilities of the class for every pixel separately. Our data set was small (210 images for training and 90 for testing), which will not be sufficient to achieve the stable model; or in worst case the network could fall into a local minima for the random assignment of initial weights. To address this challenge with small dataset, transfer learning is used, which helps to achieve comparatively more accurate

Fig. 3. An illustrative diagram of SegNet architecture. Where Layer* consists of combination of convolutional, batch normalisation and ReLU layers. 2.4 Training For the network training, 210 manually labelled images were used. Each image was labelled into 3 classes (i.e. trunk, branch, and background) using the pixel label in Matlab® environment (Figure 4a and 4b shows the original image and labelled image, respectively). The ground truth and labelled images were resized to 960×540 pixels to meet the GPU (NVIDIA GTX 1080) requirements. The initial learning rate, maximum epoch, and minimum batch size for the training were 0.001, 100, and 1, respectively. The training data were shuffled before every epoch. The classes in the data set are not balanced (background has more pixels), which causes the biasness in favour of the dominant class during the learning process. To minimize this issue, median frequency balancing was used before the training (Badrinarayanan, 2017). 2.5 Evaluation To evaluate the performance of the trained network, global accuracy, class accuracy, normalized confusion matrix, and intersection-over-union (IoU) were used. Global accuracy is the measure of the percentage of correctly classified pixels in 77

IFAC BIOROBOTICS 2018 78 Beijing, China, July 13-15, 2018

Yaqoob Majeed et al. / IFAC PapersOnLine 51-17 (2018) 75–80

the test dataset whereas the accuracy of classified pixels averaged over all the classes is represented by the class accuracy. IoU, also known as the Jaccard Index, measures the mean intersection over union over all the classes. The above mentioned parameters give the region based accuracies, which does not provide sufficient information regarding the accuracy of the segmented boundaries. Because the segmented boundaries are important for automated tree branch training, contour matching score was also analysed, which evaluates the accuracy between the ground truth and predicted class boundaries based on the F1 measure (also known as the Boundary-F1; Badrinarayanan, 2017).

Classes IoU Background 0.97 Trunk 0.64 Branch 0.52 A total of 90 images were used for testing the network; Table 2 shows the summary of the parameters used to assess the network performance with the mean class accuracy, mean IoU and mean Boundary-F1 scores for all the classes of 0.94, 0.67, and 0.92, respectively. Table 3 shows that the class accuracy for the trunk and branches was 0.92 and 0.93, respectively whereas the IoU for the same was 0.59 and 0.44. The mean Boundary-F1 score was found to be 0.93 and 0.88, respectively for the trunk and branches. Table 2. Various performance measures estimated over the entire test dataset Global Mean Class Mean IoU Mean Accuracy Accuracy Boundary-F1 Score 0.96 0.94 0.67 0.92 Table 3. Impact of each class on the overall performance Class IoU Mean Accuracy Boundary-F1 Score Background 0.96 0.96 0.95 Trunk 0.92 0.59 0.93 Branch 0.93 0.44 0.88

3. RESULTS AND DISCUSSION Figure 4a-d show respectively an example image used to test the deep learning network, ground truth (GT), result of the comparison between original image and the ground truth image, and image resulted in by the network.

(a)

(b)

Fig. 5. Normalized Confusion Matrix for true and predicted classes Figure 5 shows the normalized confusion matrix for the pixels in true and predicted classes. These results showed that 91.7 % and 92.7 % of the trunk pixels and branch pixels were correctly classified in their respective classes whereas 6.7 % of the trunk pixels were falsely classified as branch pixels.

(c) (d) Fig. 4. a) Example test image; b) Example of labelled image (Ground truth image); c) Output image compared with the Ground Truth; and d) An example output In Figure 4c magenta and green regions represent false negative and false positive areas resulted in by the image segmentation technique used in this work. Table 1 show the IoU for each of the three classes used. Table 1. IoU result of the example testing image

Figure 6a and 6b show the histogram of the per-image IoU and mean Boundary-F1 scores of the dataset, respectively. Substantially lower IoU and Boundary-F1 scores for a few images were caused by inaccuracy in background removal, which left some parts of the branches from trees in the adjacent rows. These parts were considered as background in labelling step.

78

IFAC BIOROBOTICS 2018 Beijing, China, July 13-15, 2018

Yaqoob Majeed et al. / IFAC PapersOnLine 51-17 (2018) 75–80

79

trunk, and branches). Some of the trunk pixels were misidentified as branch pixels because trunk and branch pixels shared similar features (e.g. color). Relatively lower IoU and mean Boundary-F1 scores were achieved for branch pixels compared to trunk pixels, which could be due to the presence of multiple branches an image compared to single trunk. This hypothesis needs to be further evaluated in the future. Overall, high Boundary-F1 scores were achieved for the trunk and branches, which showed that the trunk and (a) (b) branch could be segmented from each other using the deep Fig. 6. a) Histogram of mean IoU of the test dataset; and b) learning technique (SegNet) despite the fact that they are Histogram of mean Boundary_F1 score of the test dataset. similar in appearance except small difference in the spatial It is evident from the Table 3 that the class accuracy, IoU, relationship. In the future, besides increasing the training and the mean Boundary-F1 score of the trunk and branch are dataset (to further increase the network performance) and comparatively lower than the corresponding parameters for testing dataset, trellis wires will also be segmented out, which the background class. This is because of the imbalance will be essential to estimate the necessary parameters (e.g. background class relative to the trunk and branch. In this branch location and distance w.r.t trellis wire, branch study background was not the objective because it was diameter, and crotch angle) for automated selection of removed using the depth images. Relatively higher accuracy, branches for the tree training. IoU, and mean Boundary-F1 scores (0.96, 0.96, and 0.95, ACKNOWLEDGEMENTS respectively) of background shows that the leftover parts (trunk, branch, and leaves of background trees) were further This research was supported in part by United Sates successfully segmented out from the targeted tree which were Department of Agriculture (USDA)’s Hatch and Multistate left after the foreground extraction. The accuracy for the Project Funds (Accession No 1005756 and 1001246), USDA branch pixel classification (0.93) is slightly higher than that National Institutes for Food and Agriculture competitive for the trunk (0.92), which is partly because of the larger grant (Accession No 1005200), and Washington State number of branch pixels present in the one image. Relatively University (WSU) Agricultural Research Center (ARC). smaller IoU and mean Boundar-F1 score was achieved for University of Agriculture, Faisalabad (UAF) of Pakistan branch pixel segmentation (IoU: 0.44 and Boundary-F1: sponsored Yaqoob Majeed and China Scholarship Council 0.88) than the same for trunk pixels (IoU: 0.59 and (CSC) sponsored Xin Zhang in conducting PhD Dissertation Boundary-F1: 0.93). This results was potentially due to the research at WSU Center for Precision and Automated fact that in one image there is only one trunk and more than Agricultural Systems (CPAAS). CSC also sponsored Jing one branches and each branch also needs to be segment from Zhang in conducting collaborative PhD Dissertation research the other branches as well. This assumption needs to be at WSU CPAAS. Northwest A&F University, China sponsors further explored through future research. As discussed before, Dr. Longsheng Fu in conducting post-doctoral research at IoU (intersection-over-union) of the labelled regions of each WSU CPAAS. Any opinions, findings, conclusions, or class do not represent the accuracy of the segmented recommendations expressed in this publication are those of boundaries. For automated training branches to trellis wires, the authors and do not necessarily reflect the view of the U.S. accurate boundary delineation for the segmented branches is Department of Agriculture and Washington State University. m ore important. Higher boundary accuracy of 0.93 and 0.88 for the trunk and branches, respectively, shows the potential REFERENCES of this technique for practical adoption. Future research with large dataset would be helpful to further validate the results. Akbar, S. A., Elfiky, N. M., & Kak, A. (2016). A novel framework for modeling dormant apple trees using Besides increasing the training dataset, one more class of single depth image for robotic pruning application. trellis wire will be added, which will be helpful to estimate In Robotics and Automation (ICRA), 2016 IEEE the branch and trunk parameters to facilitate the automatic International Conference, 5136-5142. branch selection for training. Baeten, J., Donné, K., Boedrij, S., Beckers, W., & Claesen, E. (2008). Autonomous fruit picking machine: A robotic apple harvester. In Field and service robotics, 531-539.

4. CONCLUSION The overall goal of this study was to segment out the trunk and branches of young apple trees for automated training in the orchard environment. Kinect V2 (Microsoft Corporation, Redmond, WA) sensor was used to collect 3D point cloud data and color images, and unwanted background objects (e.g. trees from adjacent rows) was filtered out using a depth threshold. Then, color and 3D images and a deep learning network was used for semantic segmentation of tree trunk and branches. Segmentation results showed a high pixel classification accuracy for all three classes (background,

Badrinarayanan, V., Kendall, A. and Cipolla, R., 2017. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence, 39(12), 2481-2495. Bargoti, S. and Underwood, J., 2017, May. Deep fruit detection in orchards. In Robotics and Automation (ICRA), 2017 IEEE International Conference on, 36263633. 79

IFAC BIOROBOTICS 2018 80 Beijing, China, July 13-15, 2018

Yaqoob Majeed et al. / IFAC PapersOnLine 51-17 (2018) 75–80

Bulanon, D. M., & Kataoka, T. (2010). Fruit detection system and an end effector for robotic harvesting of Fuji apples. Agricultural Engineering International: CIGR Journal, 12(1).

neural networks. In Geoscience and Remote Sensing Symposium (IGARSS), 2015 IEEE International, 49594962. Noh, H., Hong, S. and Han, B., 2015. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, 1520-1528.

Chattopadhyay, S., Akbar, S. A., Elfiky, N. M., Medeiros, H., & Kak, A. (2016). Measuring and modeling apple trees using time-of-flight data for automation of dormant pruning applications. In Applications of Computer Vision (WACV), 2016 IEEE Winter Conference, 1-9.

Ren, S., He, K., Girshick, R. and Sun, J., 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, 91-99.

Elfiky, N. M., Akbar, S. A., Sun, J., Park, J., & Kak, A. (2015). Automation of Dormant Pruning in Specialty Crop Production: An adaptive Framework for Automatic Reconstruction and Modeling of Apple Trees. 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 65-73. Girshick, R., 2015. arXiv:1504.08083.

Fast

r-cnn. arXiv

Schupp, J. R., H. E. Winzeler, T. M. Kon, R. P. Marini, T. A. Baugher, L. F. Kime & M. A. Schupp. (2017). A method for quantifying whole-tree pruning severity in mature tall spindle apple plantings. HortScience, 52, 1233-1240.

preprint

Silwal, A., Davidson, J. R., Karkee, M., Mo, C., Zhang, Q., & Lewis, K. (2017). Design, integration, and field evaluation of a robotic apple harvester. Journal of Field Robotics, 34(6), 1140-1159.

Girshick, R., Donahue, J., Darrell, T. and Malik, J., 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 580-587.

Silwal, A., Gongal, A., & Karkee, M. (2014). Apple Identification in Field Environment with Over the Row Machine Vision System. Agricultural Engineering International: CIGR Journal, 16(4), 66-75.

Hertz, T., & Zahniser, S. (2013). Is There a Farm Labor Shortage? American Journal of Agricultural Economics, 95(2), 476-481.

Simonyan, K. and Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

Huang, B., Shao, M., & Chen, W. (2015). Design and research on end effector of a pruning robot. International Journal of Simulation--Systems, Science & Technology, 17(36).

Sun, X., Wu, P. and Hoi, S.C., 2017. Face detection using deep learning: An improved faster rcnn approach. arXiv preprint arXiv:1701.08289.

Ji, W., Zhao, D., Cheng, F., Xu, B., Zhang, Y. & Wang, J. (2012). Automatic recognition vision system guided for apple harvesting robot. Computers and Electrical Engineering, 38(5), 1186-1195.

USDA-NASS. (2017). National agricultural statistics database. Washington, DC: USDA-NASS National Agricultural Statistics Service. Weber, M.S. (2000). Optimizing the tree density in apple orchards on dwarf rootstocks. In VII International Symposium on Orchard and Plantation Systems 557, 229-234.

Kampffmeyer, M., Salberg, A.B. and Jenssen, R., 2016, June. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2016 IEEE Conference, 680-688.

Whiting, M.D. 2018. Precision orchard systems. In: Automation in Tree Fruit Production. Q. Zhang (ed.). CAB International. pp 75-93.

Karkee, M., & Adhikari, B. (2015). A Method for ThreeDimensional Reconstruction of Apple Trees for Automated Pruning. Transactions of the ASABE, 58(3), 565-574.

Wu, C., He, L., Du, X. and Chen, S., 2014. 3D reconstruction of Chinese hickory tree for dynamics analysis. Biosystems engineering, 119: 69-79.

Karkee, M., Adhikari, B., Amatya, S., & Zhang, Q. (2014). Identification of pruning branches in tall spindle apple trees for automated pruning. Computers and Electronics in Agriculture, 103, 127-135.

Zhang, J., He, L., Karkee, M., Zhang, Q., Zhang, X. and Gao, Z., 2017. Branch Detection with Apple Trees Trained in Fruiting Wall Architecture using Stereo Vision and Regions-Convolutional Neural Network (R-CNN). In 2017 ASABE Annual International Meeting (p. 1). American Society of Agricultural and Biological Engineers.

Long, J., Shelhamer, E. and Darrell, T., 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3431-3440.

Zhao, D., Lv, J., Ji, W., Zhang, Y., & Chen, Y. (2011). Design and control of an apple harvesting robot. Biosystems engineering, 110(2), 112-122.

Makantasis, K., Karantzalos, K., Doulamis, A. and Doulamis, N., 2015, July. Deep supervised learning for hyperspectral data classification through convolutional 80