Cucumber leaf disease identification with global pooling dilated convolutional neural network

Computers and Electronics in Agriculture 162 (2019) 422–430 Contents lists available at ScienceDirect Computers and Electronics in Agriculture journ...

Download PDF

2MB Sizes 0 Downloads 31 Views

Report

PDF Reader
Full Text

Computers and Electronics in Agriculture 162 (2019) 422–430

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture journal homepage: www.elsevier.com/locate/compag

Original papers

Cucumber leaf disease identiﬁcation with global pooling dilated convolutional neural network

T

⁎

Shanwen Zhanga, Subing Zhangb, Chuanlei Zhangc, , Xianfeng Wanga, Yun Shia a

School of Information Engineering, Xijing University, Xi’an 710123, China China Electronics Standardization Institute, Beijing 100007, China c College of Computer Science and Information Engineering, Tianjin University of Science and Technology, Tianjin 300222, China b

A R T I C LE I N FO

A B S T R A C T

Keywords: Cucumber disease identiﬁcation Convolutional neural network (CNN) Global pooling CNN Dilated convolutions Global pooling dilated CNN (GPDCNN)

It is a challenging research topic to identify plant disease based on diseased leaf image processing techniques due to the complexity of the diseased leaf images. Deep learning models are promising for identifying plant disease based on leaf images and AlexNet is one of these models. Aiming at the problems of too many parameters of the AlexNet model and single feature scale, a global pooling dilated convolutional neural network (GPDCNN) is proposed in this paper for plant disease identiﬁcation by combining dilated convolution with global pooling. Compared with the classical convolutional neural network (CNN) and AlexNet models, GPDCNN has three improvements: (1) the convolution receptive ﬁeld are increased without increasing the computational complexity and without losing the discriminant formation by replacing fully connected layers with a global pooling layer; (2) dilated convolutional layer is employed to recover the spatial resolution without increasing the number of training parameters; (3) GPDCNN also integrates the merits of dilated convolution and global pooling. Experimental results on the datasets of six common cucumber leaf diseases demonstrate that the proposed model can eﬀectively recognize cucumber diseases.

1. Introduction Plant diseases are responsible for major economic losses in the agricultural production. Timely detecting and identifying plant diseases is essential to cure and control them. Various approaches have been presented for detecting and identifying plant diseases. Martinelli et al. (2015) described the modern methods based on nucleic acid and protein analysis, and reviewed the innovative approaches currently under development. Fang and Ramasamy (2015) reviewed the direct and indirect disease identiﬁcation methods currently used in plant disease detection, such as enzyme-linked immune-sorbent assay, immuneﬂuorescence, ﬂuorescence in-situ hybridization, ﬂuorescence imaging and hyperspectral techniques. They also provided a comprehensive overview of biosensors based on highly selective bio-recognition elements such as enzyme, antibody, DNA/RNA and bacteriophage as a new tool for the early plant disease identiﬁcation. Although these methods are eﬀective, they are diﬃcult to implement for common farmers. Plant disease recognition based on the leaf lesion image is a challenging research topic in computer vision, image processing and precision agriculture, which can provide accurate, fast and eﬃcient disease diagnosis. Many classical methods of plant disease recognition

⁎

focus on the feature extraction from the diseased leaf image. Gulhane and Gurjar (2011) proposed a cotton leaf disease recognition method. In the method, various features of leaf image are extracted from the color of actual infected image, and a back-propagation neural network (BPNN) is used to recognize the color diseased leaf image. Bashish et al. (2011) proposed a leaf disease detection and classiﬁcation method based on K-means-based segmentation and neural-networks-based classiﬁcation. The experiment results validated that the proposed detection model based neural networks is very eﬀective in recognizing leaf diseases, whilst K-means clustering technique provides eﬃcient results in segmentation RGB images. Wang et al. (2012) presented a plant disease recognition method, where 21 color features, 4 shape features and 25 texture features are extracted from two kinds of wheat and grape diseased leaf images, respectively, and principal component analysis (PCA) is utilized to reduce the dimensions of the extracted feature data, and several neural networks classiﬁers are used to identify the wheat and grape diseases. The results showed that these neural networks could be used for plant diseased leaf image recognition based on PCA. Image processing is best way for detecting and diagnosis plant leaf diseases. Garcia (2013) presented a survey on the digital image processing techniques for detecting, quantifying and classifying plant

Corresponding author. E-mail address: [email protected] (C. Zhang).

https://doi.org/10.1016/j.compag.2019.03.012 Received 2 December 2018; Received in revised form 11 February 2019; Accepted 11 March 2019 Available online 30 April 2019 0168-1699/ © 2019 Published by Elsevier B.V.

Computers and Electronics in Agriculture 162 (2019) 422–430

S. Zhang, et al.

types of plant diseases with the ability to distinguish plant leaves from their surroundings. Fuentes et al. (2017) presented a deep learning based tomato disease and pest detection method. They considered three main families of detectors: i.e., region-based fully convolutional network (R-FCN), faster region-based CNN (Faster R-CNN) and single shot multi-box detector, and combined each of these meta-architectures with “deep feature extractors” such as VGG network and residual network (ResNet), and ﬁnally proposed a method for local and global class annotation and data augmentation to increase the accuracy and reduce the number of false positives during training. Liu et al. (2018) proposed an accurate identifying approach for apple leaf diseases based on DCNN. The experimental results on the four common apple leaf disease dataset indicated that DCNN can provide a better solution in plant disease identiﬁcation with high accuracy and a faster convergence rate. In traditional CNN and DCNN, the ratio of parameters in all fully connected layers is almost 80% of the whole network, which will increase the training and testing time, and lead to a large demand for computer memory. Too many parameters could result in overﬁtting problem. Hinton et al. (2012) set dropout in the fully connected layer to eﬀectively reduce the amount of parameters, avoid overﬁtting and make the model more robust, but the optimization of dropout parameters depends on human experience. Some steps are often adopted to improve the recognition performance of CNN and DCNN models, and the steps could be reducing the computational complexity by pooling operation, increasing the receptive ﬁeld and expanding the image to the original size by deconvolution. However some discriminant information may lose and the information of the small object cannot be recovered and reconstructed. The global pooling is often used to replace the traditional fully connected layers in CNN to enforce the relationship between feature maps and categories, and overﬁtting is naturedly avoided at this layer, because there is no parameter needed to optimize in the global pooling (Liu et al., 2017). Dilated convolution can avoid pooling operation, enlarge the receptive ﬁeld, and achieve good segmentation and recognition results (Kudo and Aoki, 2017; Renton et al., 2018). Fully Convolutional Networks (FCN) has shown compelling quality and eﬃciency for image classiﬁcation. Each output pixel is a classiﬁer corresponding to the receptive ﬁeld and the networks can thus be trained pixel-to-pixel given the category-wise semantic segmentation annotation. It can retain the internal data structure of the image without reducing the image resolution and without increasing the number of parameters or the amount of computation (Khan et al., 2018). Many multi-scale feature extraction methods have been proposed using a pyramid of diﬀerent rescaled versions of an original image as input to an improved convolution neural network. But these methods require extremely high computational costs, because of a huge amount of input parameters. Current CNN based plant diseased leaf image classiﬁcation approaches also include multi-scale deep features selected from diﬀerent layers of pooling and subsampling in a deep convolutional neural network, where the receptive ﬁeld in an original image

diseased leaf digital images in the visible spectrum. The paper is useful to researchers working on both vegetable pathology and pattern recognition, providing a comprehensive and accessible overview in this important research ﬁeld. Khairnar and Dagade (2014) reviewed a lot of plant disease detection and diagnosis methods, introduced several different feature extraction techniques for extracting various features from plant diseased leaf images, such as color histogram, scale-invariant feature transform (SIFT), Gabor ﬁlter, grey level co-occurrence matrix (CCM), Canny and Sobel edge detector, and then suggested several diﬀerent classiﬁcation classiﬁers such as Artiﬁcial Neural Network (ANN), Support Vector Machine (SVM), BPNN, Radial Basis Function Neural Network (RBFNN), Probabilistic Neural Network (PNN). Qin et al. (2016) investigated the identiﬁcation and diagnosis methods of the four types of alfalfa leaf diseases using pattern recognition and image-processing algorithms, in which a sub-image with one or multiple typical lesions is obtained by artiﬁcial cutting from each acquired digital disease image, then the sub-images are segmented using twelve lesion segmentation algorithms including K-means clustering, K-median clustering, and fuzzy C-means clustering, and supervised classiﬁcation methods such as linear discriminant analysis (LDA), Naive Bayes algorithm, logistic regression analysis, SVM, and regression tree. A lot of comparative experiments are conducted to validate that the study can provide a feasible solution for lesion image segmentation and recognition of alfalfa leaf diseases. Many dimensionality reduction and sparse representation algorithms have been applied to plant disease recognition ﬁeld (Li et al., 2019; Zhao et al., 2012), but the recognition results are not ideal, due to the complex diseased plant leaf images. A main step in the above traditional methods is feature extraction and selection. However, it is diﬃcult to extract and select the optimal features from diseased leaf images for disease recognition due to the diseased leaf images are often very complex and irregular, as shown in Fig. 1. Furthermore, plant disease leaves visibly show a variety of shapes, forms, colors, etc. and it is not easy to extract optimal robust multi-resolution high-level features from various plant disease leaves. So these traditional methods cannot guarantee high recognition rates of plant leaf diseases. In recent years, convolutional neural networks (CNNs) and their modiﬁed models have acquired signiﬁcant attention in image recognition and classiﬁcation (Zhao et al., 2018; Mccann et al., 2017; Kamilaris and Prenafeta-Boldú, 2018). Diﬀerent from the traditional methods, CNN can learn high-level robust features directly from the original image instead of extracting the speciﬁc features manually. It has been widely applied to various image classiﬁcation and achieved impressive results (Al-Saﬀar et al., 2018; Rawat and Wang, 2017). In plant species and plant disease recognition, it is demonstrated that CNNs can provide better performance than the traditional feature extraction methods (Dyrmann et al., 2016; Mohanty et al., 2016). Sladojevic et al. (2016) proposed a plant leaf disease recognition method based on deep CNN (DCNN) model, and fully described all essential steps required for plant disease recognition. The method can eﬀectively recognize 13 diﬀerent

Fig. 1. Examples of cucumber diseased leaves. 423

Computers and Electronics in Agriculture 162 (2019) 422–430

S. Zhang, et al.

resolution by dilating the ﬁlter before computing the usual convolution. The size of the convolutional ﬁlter is expanded, and the empty positions are ﬁlled completely with 0, obtaining the height and width of the convolution kernel to represent the original convolution kernel. Dilated convolutional layer has been validated to be eﬀective in image segmentation tasks (Kudo and Aoki, 2017; Khan et al., 2018). It is a good pooling layer to alternate the pooling and convolutional operator by sparse kernels. A 2D dilated convolutional layer is deﬁned as follow,

can be expanded to better cover global features. However, these methods lead to a reduction in the resolution and a loss of details and local features in an image. Inspired by current plant CNN based disease identiﬁcation approaches, we hope to ﬁnd a better solution for combining the advantages of multi-resolution images and multi-scale feature descriptors to extract both global and local information in an image without losing resolution. CNN is able to automatically and hierarchically extract the latent feature for pattern recognition. It is a very promising candidate for a practical and scalable approach to solve the various image classiﬁcation tasks. CNN and its variants have been applied to plant disease recognition and the obtained results are promising. However, CNN and its modiﬁed models may not be eﬃcient since there are a huge number of parameters needed to be trained. Global Pooling deep CNN (GPDCNN) can automatically learn features from the diseased leaf images for disease recognition. Compared with the deep CNN (DCNN) based crop disease recognition methods, GPDCNN can overcome the problem of spending a long time on training a large number of parameters of the network. Recent works showed that dilated convolutions can give good performance in image classiﬁcation and machine translation (Zhao et al., 2018; Al-Saﬀar et al., 2018; Liu et al., 2017). Traditional neural networks apply pooling or convolution with 2 or more stride to decreasing the feature map resolution and expanding the receptive ﬁeld. Dilated convolution supports exponential expansion of the receptive ﬁeld without loss of feature map resolution since it applies convolution with a dilation factor instead of convolution after decreasing of the feature map resolution (Kudo and Aoki, 2017; Renton et al., 2018). Integrating the merits of the dilated convolution and global pooling, we construct a novel global pooling dilated convolutional neural network (GPDCNN) for cucumber disease recognition, including two major stages: global pooling CNN as the front-end for 2D feature extraction and a dilated CNN for the back-end by dilated kernels to deliver larger reception ﬁelds and to replace pooling operations. Comparing with the existing AlexNet model, GPDCNN has three improvements: (1) increasing the receptive ﬁeld of the ﬁrst convolutional layer by means of dilated convolution; (2) replacing the fully connected layer by global pooling layer to reduce the network parameters; (3) increasing the diversity of features by means of multi-scale feature fusion. The main contributions of this paper are summarized as follows:

M

y (m , n ) =

N

∑ ∑ x (m + r × i, n + r × j) w (i, j) i=1 j=1

(1)

where y (m , n) is the output of dilated convolution of the input x (m , n) , w (i, j ) is a ﬁlter with size M × N, r is the dilation rate. If r = 1, the dilated convolution is degenerated into a normal convolution. In dilated convolution, a small-size kernel with sizek × k is expanded to k + (k − 1)(r − 1) with the dilated stride r. Thus it allows ﬂexible accumulation of the multi-scale contextual information while keeping the same resolution. In common convolutional layer with size 3 × 3, its receptive ﬁeld is 3 × 3, while in dilated convolution with size 3 × 3, its receptive ﬁeld is 7 × 7 when r = 2, and 15 × 15 when r = 3. In general, the receptive ﬁeld is (2i + 1 − 1) × (2i + 1 − 1) when r = i . That is to say, the systematic dilation satisﬁes exponential expansion of the receptive ﬁeld without loss of resolution or coverage (Kudo and Aoki, 2017).

2.2. Global pooling CNNs takes the convolutional layers as a feature extractor to extract the high-level feature maps, and inputs these feature maps into the fully connected layers to stretch a long feature vector, and then is fed into Softmax classiﬁer. The shortcoming is that the fully connected layer has too many parameters, which reduce the speed of training and easily result in overﬁtting. A global pooling layer is used to replace all of the fully connected layers in CNNs on the top of the feature maps. In global pooling layer, each extracted feature map of the last convolutional layer generates a feature point, and all the points are constructed into a vector, then the vector is fed directly into the Softmax layer for each corresponding category of the classiﬁcation task. One advantage of global pooling layer over the fully connected layers is that the correspondences between the extracted feature maps and categories are enforced. Another advantage is that there is no parameter to optimize in the global pooling layer, thus overﬁtting is naturally avoided at this layer. Furthermore, the global pooling sums up the spatial information, so the constructed feature vector is more robust to spatial translation of the input images. Suppose there are 10 6 × 6 feature maps in the last convolutional layer. Global average (or maximum) pooling is to calculate the average (or maximum) of each of the 10 feature maps, thus 10 feature maps will output 10 feature points. We concatenate these data points into a 1 × 10 feature vector, and input it to Softmax for image classiﬁcation. In plant leaf disease recognition, using global average pooling (GAP) is better than fully connected operator. GAP can achieve dimension reduction and parameter reduction, and enhance the generalization ability. It can overcome overﬁtting without optimizing the dropout parameters, because there is no parameter needed to optimize in the GAP layer. Let x ijk be the k-th feature map with size m × n in the last convloutional layer, GAP is performed as follows,

(1) The convolution kernel in the AlexNet model is replaced by dilated convolution, which can enlarge the local receptive ﬁeld and enhance the feature extraction ability of the convolution layer. (2) Using global pooling layer can eﬀectively reduce the number of training parameters and avoid overﬁtting to a certain extent. (3) The multi-scale convolutional kernels are used to extract the multiscale features of the input image. After fusing these features, the recognition accuracy and the robustness of the model can be improved. The rest of this paper is organized as follows. Section 2 introduces dilated convolution and global pooling. In Section 3, GPDCNN is described in detail. Section 4 analyzes the experimental results provided by the GPDCNN based cucumber leaf disease recognition approach. Finally, this paper is concluded in Section 5. 2. Related works 2.1. Dilated convolution In classical CNN, pooling layers are widely used to maintain invariance and avoid overﬁtting, but they dramatically reduce the spatial resolution of the input images, thus the spatial information of the extracted feature maps will be lost. Although deconvolutional layers can alleviate the information loss, the additional complexity may not be suitable for all cases. Dilated convolution can recover the spatial

k yGAP =

1 mn

m−1 n−1

∑ ∑ xijk i=0 j=0

(2)

k where yGAP is output of the GAP layer. Thus, for a given class c, the input to the Softmax classiﬁer Sc is

424

Computers and Electronics in Agriculture 162 (2019) 422–430

S. Zhang, et al.

Fig. 2. The structure of GPDCNN. N

Sc =

k ∑ wck yGAP k=1

multi-scale feature maps are extracted, including 128 pooling feature maps and three kinds of convolutional feature maps. Step 3, the multi-scale convolution layer Inception includes three convolution layers and one pool layer. The size of convolution layer and pool layer is 1 × 1. The dimension of the characteristic graph is 96 × 96, 16 × 16, 64 × 64 and 32 × 32. Step 4, the multi-scale feature map obtained by step 3 is integrated, and Concat is used to make the network model learn more detailed features of the lesions, and the network model with dilated convolution in complex background has a higher recognition rate. Step 5, Concat in step 4 consists of three convolution layers. The dimensions of the convolution layers are 1 × 1, 3 × 3 and 5 × 5, respectively. The dimension of the feature graph obtained is 128 × 128. Step 6, the fusion feature map obtained by step 5 is fed to Conv5, and the global pooling layer is used to reduce the number of parameters and avoid overﬁtting. Step 7, the dimension of convolution layer Conv5 in step 6 is 3 × 3, and the dimension of characteristic graph is 448 × 448. The global pooling layer includes one convolution layer and one batch normalization layer, the dimension of convolution layer is 3 × 3, and the output dimension is 128 × 128. Step 8, input the feature map obtained in step 7 into the Softmax classiﬁer to classify plant diseases.

(3)

where wck is the weight corresponding to class c. Specially, wck indicates k the importance of yGAP for class c. 3. Global pooling dilated convolutional neural network Making use of the advantages of the dilated convolution and global pooling, a global pooling dilated convolutional neural network (GPDCNN) model is proposed for plant leaf disease recognition. 3.1. GPDCNN structure GPDCNN is a modiﬁed CNN based on AlexNet model. Its structure is described in Fig. 2, including 13 layers, i.e., ﬁve convolutional layers Conv1-Conv5, four pooling layers Pooling1-Pooling4, a global pooling layer Global Pooling, a multi-scale convolution layer Inception, a feature fusion layer Concat, a Softmax classiﬁer, where Inception has three convolutional layers and a pooling layer; Concat has three convolutional layers. Diﬀerent from AlexNet, in GPDCNN, the original convolution kernel in Conv1 is replaced by a dilated convolution kernel, Inception and Concat are added behind Pooling4, and a global pooling layer instead of two fully connected layers is added after Conv5, and followed by a Softmax classiﬁer. Inception layer, ConCat layer and Softmax classiﬁer are used to activate the feature maps of each convolution layer. Nonlinear activation function is used to reduce the training time of GPDCNN, and to some extent, it can restrain the overﬁtting problem. Each convolution layer is followed by a nonlinear activation function (RELU). The number of the output channels of each layer, the size of the convolution kernel and the number of the extracted maps are denoted in Fig. 2, respectively. The convolutional kernel numbers of Conv1, Conv2, …, Conv5 are 96, 128, 192, 192 and 128, respectively. The kernel size of Conv1 is 7 × 7 dpi, the kernel size of Conv2 is 5 × 5 dpi, the kernel sizes of Conv3, Conv4, Conv5 and global pooling layer are all 3 × 3 dpi; the pooling sizes of pooling1-pooling4 are all 2 × 2 dpi with stride 2 in each layer blocks; four kernel sizes of Inception are all 1 × 1 dpi, and three kernel sizes of Concat are 1 × 1 dpi, 3 × 3 dpi and 5 × 5 dpi, respectively.

We use a straightforward way to train GPDCNN as an end-to-end structure. The ﬁrst 4 convolutional layers are ﬁne-tuned from a welltrained AlexNet (http://caﬀe.berkeleyvision.org/tutorial/layers.html). For other layers, the initial values come from a Gaussian initialization with standard deviation 0.01. We apply stochastic gradient descent (SGD) to train GPDCNN with ﬁxed learning rate during training. The Euclidean distance is used to measure the diﬀerence between the ground truth and the estimated density map. The loss function is given as follow:

L (Θ) =

1 2N

N

∑ ∥Z (Xk ; Θ) − ZkGT ∥22 k=1

(4)

where N is the size of training batch and Z (Xk ; Θ) is the output generated by GPDCNN with parameter set Θ. Xk is the input image while ZkGT is the truth result of the input image Xk. Our work aims to identify cucumber diseases that aﬀect cucumber plants using GPDCNN as the main body of the cucumber disease recognition system. A general overview of the system is presented in Fig. 3.

3.2. Process of GPDCNN A large number of training samples are used to train the GPDCNN. Step 1, extract the feature maps from the original training samples through the ﬁrst four layers, including conv1, pooling 1, conv2, pooling 2, conv3, pooling 3, conv4 and pooling 4, thus obtain 192 basic feature maps. Step 2, in Inception layer, the extracted feature maps are processed by three convolutional operators and one pooling operator, and

4. Experiments and results To validate the performance of GPDCNN, we conduct a lot of experiments on a real-world cucumber diseased leaf image dataset and its corresponding lesion image dataset, and compare with four crop 425

Computers and Electronics in Agriculture 162 (2019) 422–430

S. Zhang, et al.

two typical deep learning models often used in image classiﬁcation. All experiments are implemented on a PC computer with Ubuntu 16.04 system, memory size 16 GB, Intel@ Core™ i7-7 700 KCPU@ 4.00 GHz X8 processor, GPU of Nvida GTX1080Ti and 16 nm fabrication, GDDR5, capacity of 11 GB and a core frequency of 1480–1582 MHz, Caﬀe framework, an open source convolutional architecture for fast feature embedding exploited by the Berkeley Vision and Learning Center (BVLC), and Tensorﬂow in Python 3.4.

Image dataset Classification Result Data augmentation Testing set Annotated and augmented data

Training set

Trained GPDCNN Training

GPDCNN

Fig. 3. The cucumber disease recognition system.

4.1. Data collection disease recognition methods, i.e., probabilistic neural networks (PNNs) (Khan et al., 2018), sparse representation classiﬁcation (SRC) (Shi et al., 2015), Deep CNNs (DCNNs) (Liu et al., 2018), and AlexNet (Alex et al., 2012). PNNs and SRC are two traditional crop disease recognition methods, in which it is required to artiﬁcially extract discriminative features which greatly rely on prior knowledge. DCNNs and AlexNet are

A BM-500GE/BB-500GE digital color camera with resolutions of 2456 × 2058 pixels was used to capture crop diseased leaf images. From the cucumber planting bases of Yangling agricultural high-tech industrial demonstration area, Shaanxi, China, 600 cucumber diseased leaves of 6 common cucumber leaf diseases and 100 normal leaves were collected, 100 leaf images per disease with typical disease symptoms

Fig. 4. Examples of cucumber diseased leaf images and corresponding to segmented lesion images. 426

Computers and Electronics in Agriculture 162 (2019) 422–430

S. Zhang, et al.

Fig. 5. Examples of augmentation images of a diseased leaf and its corresponding lesion.

Fig. 6. The recognition rates versus the iterations.

article/details/53816010?utm_source=copy) (Hu et al., 2015), where each original image is radially blurred by rotation blur and scaling blur respectively, the rotating fuzzy unit is 10, and the scaling fuzzy unit is 30; hue, saturation and brightness of each original image are increased by 20%, the contrast is increased by 30%, the sharpness is decreased by 10%; a 3 × 3 transformation matrix is required in perspective transformation; 30% Gauss noise is added to the original image with an oﬀset of 0.2 and a standard deviation of 0.3, and PCA (Principal Component Analysis) jittering is used to disturb the natural image. After image augmenting, each image is produced 50 images. Thus, we obtain 600 × 50 = 30,000 diseased leaf images and 30,000 corresponding segmented lesion images, and 5000 normal leaf images. Finally, two general datasets, i.e., a diseased leaf image dataset containing all normal leaf images and a lesion image dataset, are constructed. Two datasets are able to simulate the natural environment of image acquisition, and provides an important guarantee of generalization capability of CNNs. Some augmentation examples are shown in Fig. 5. This annotation process aims to the disease class label of the lesion areas in the diseased leaf image. Starting with the dataset of images, we manually annotate the areas of every image containing the lesion with a bounding box and class. Some lesions might look similar depending on the infection status, so the knowledge for classifying the type of the

under several diﬀerent conditions depending on the time (e.g., illumination), season (e.g., temperature and humidity), and place where they were taken (e.g., farmland and greenhouse). Six kinds of diseases are downy mildew, anthracnose, gray mold, angular leaf spot, black spot, and powdery mildew. To reduce the workload of the image analysis and focus on the regions of interest, we cropped each original diseased leaf image by artiﬁcial cutting. The cropped sub-image has typical lesions with same size 240 × 240. The K-means clustering algorithm is used to segment diseased leaf images (Fang and Ramasamy, 2015; Gulhane and Gurjar, 2011; Bashish et al., 2011; Wang et al., 2012). All segmented lesion images are only utilized to illustrate the highlight of the proposed method. Some original diseased leaf images and corresponding segmented lesion images are shown in Fig. 4. An eﬀective CNNs model relies on a lot of iterative training on a large-scale image set. However, the amount of our dataset is too small to overcome overﬁtting of network. To produce suﬃcient diseased leaf images and increase the diversity of the dataset, the natural cucumber diseased leaf images are ﬁrst acquired and then processed using data augmentation techniques, including geometrical transformations (random shift, random resize, random crop, random rotation/reﬂection, horizontal/vertical ﬂipping) and intensity transformations (adjusting contrast and brightness enhancement, color jittering, noise addition, PCA jittering, radial blur) (https://blog.csdn.net/mwa2016/ 427

Computers and Electronics in Agriculture 162 (2019) 422–430

S. Zhang, et al.

(a) Original cucumber leaf image of Angular Leaf Spot

(b) Conv1

(h) Conv3

(c) Pooling1

(i) Pooling3

(d) ReLU1

(e) Conv2

(f) Pooling2

(g) ReLU2

(j) ReLU3

(k) Conv4

(l) Pooling4

(m) ReLU4

(n) Conv5

(o) Global average pooling

Fig. 7. Output images in diﬀerent layers after 1000 iterations.

Batch size is set to 64 in training, and 50 in testing. Momentum is set to 0.9 without accelerated gradient. The Gaussian distribution with a mean of 0 and a standard deviation of 0.01 is used to initialize the weights of the network randomly. SGD with a mini-batch size of 256 is used to update the parameters of network by minimizing the total loss function in Eq. (3). The initial learning rate is set at 0.01, which is gradually reduced to 1/10 of the original when the epochs is 100 or 150 and the models are trained for up to 200 epochs., the regularization coeﬃcient is set at 0.0005 and the coeﬃcient of expansion is set to 2 in Conv1. Global average pooling (GAP) is able to reduce the error caused by increasing estimation variance due to the size of neighborhood constraints, and retain the background information of the image as much as possible, which is conducive to extracting key features, while the global maximum pooling (GMP) can preserve more low-level features but ignoring the high-level features. We utilize GAP for cucumber disease recognition. Five-fold-cross-validation (FFCV) is used to validate the performance of the proposed method. In FFCV, all image samples of the dataset are randomly split into 5 equal-sized subsets. In each time, a single subset is used as test set for test the network, whereas the remaining 4 subsets are used as training set to train the network. The cross-validation experiments are implemented 5 times, with each of the 5 subsets used exactly once as the test set. Thus, 5 results are averaged to produce a single estimation a FFCV experiment. Fig. 6 is the recognition rates versus the iterations in training on the diseased leaf dataset. From Fig. 6, it is seen that the average recognition rates of two common DCNNs and AlexNet models are around 87.65%, which is signiﬁcantly lower than the average recognition rate of GPDCNN 90%. Moreover, GPDCNN begins to converge when the iterations reached

Table 1 Cucumber disease recognition rate by DCNNs, AlexNet and GPDCNN on the diseased leaf image dataset. Model

Pooling type

Training time (h)

Testing time (s)

Accuracy

DCNNs AlexNet GPDCNN

Fully-connected Fully-connected GAP

14.5 21.3 6.2

3.64 3.72 3.58

91.73 92.48 94.65

Table 2 Cucumber disease recognition rates (%) by GLSVD, IPT and GPDCNN on different datasets. Methods

GLSVD IPT GPDCNN

Datasets Diseased leaf dataset

Segmented lesion dataset

Original

Augmented

Original

Augmented

62.36 59.71 78.32

62.48 60.24 94.65

88.73 87.82 81.56

89.63 89.16 95.18

diseased leaf is provided by the experts in the area. The annotation outputs are the coordinates of the bounding boxes of diﬀerent sizes with their corresponding disease class, which consequently will be evaluated with the predicted results of the network during testing. 4.2. Experimental setup GPDCNN is trained by mini-batch SGD with momentum factor. 428

Computers and Electronics in Agriculture 162 (2019) 422–430

S. Zhang, et al.

results and convergence process show that GPDCNN has higher recognition rate and learning rate.

about 60,000 times. But DCNNs and AlexNet need more than 90,000 iterations to achieve the stable recognition result. The training model is implemented on the training set, after that the recognition is performed on the test set, and when the experiments seem to achieve the expected results, the ﬁnal recognition result is obtained on the testing set. From Fig. 6, we ﬁnd that three models achieve the stable results after 17,000 iterations. So for simplicity, we perform GPDCNN, DCNNs and AlexNet 180,000 iterations on the diseased leaf dataset. Then the trained models are applied to identify the test set samples. To validate the advantages of GPDCNN in automatic feature extraction of image, Fig. 7 shows the output images in diﬀerent layers after 10,000 iterations without ﬁnetuning, where all original diseased leaf images are used as training samples. Fig. 7 shows that the feature map is across an image by applying convolution, pooling and ReLU operators, and diﬀerent layers can extract diﬀerent features, where each feature map is displayed in diﬀerent blocks and ﬁnally the GAP layer can learn much highlight features.

5. Conclusions In this paper, a novel deep learning architecture called GPDCNN for cucumber disease recognition is proposed. The dilated convolutional and global pooling layers are used to aggregate the multiscale contextual information and speed up convergence, and improve the recognition rate. With the dilated convolutional layers, GPDCNN can extend the receptive ﬁeld without losing resolution. We demonstrated the model on cucumber diseased leaf image datasets with the state-ofthe-art performance. In future developments, we intend to improve the performance of GPDCNN by exploring the key role of probabilistic graphical models and further enhance our method in the ﬁeld of crop disease recognition system. Acknowledgments

4.3. Experimental results This work is partially supported by the China National Natural Science Foundation under Grant No. 61473237, Key Research and Development Plan of Shaanxi No. 2017ZDXM-NY-088 and the Key Project of Tianjin Natural Science Foundation No. 18JCZDJC32100.

To reduce the inﬂuence of random eﬀects, we repeat FFCV experiment 50 times and calculate the average results as the ﬁnal experimental result. The experiment results of DCNNs, AlexNet and the proposed GPDCNN method are given in Table 1. From Table 1, it can be seen that GPDCNN outperforms DCNNs and AlexNet in terms of recognition accuracy and training time. The reason is that GPDCNN adopts GAP instead of fully-connection to accelerate the training process, and uses dilated convolution and multi-scale convolutional kernel to improve the recognition rate. Some reasons may be that, in DCNNs and AlexNet structures, a lot of calculation and convergence time is taken to evaluate its large number of weight parameters, due to the use of the fully-connected layer, and their ability to extract multi-resolution features depends on the resolution of the feature maps. In order to indicate the advantage of GPDCNN, we perform the disease recognition experiments by the traditional methods, GLSVD and IPT, on the original diseased leaf image dataset and the segmented spot images dataset. The comparison results are given in Table 2. From Table 2, we can see that the proposed method achieves much better recognition rate than the traditional approaches on the augmented diseased leaf image dataset, and only acquires less improvements from the segmented lesion image dataset, while the recognition rates of GLSVD and IPT increase greatly on the segmented lesion image dataset. The reasons are that GPDCNN needs a lot of training samples to train and can automatically learn the high-level abstract features from the original images, while the recognition accuracies of GLSVD and IPT rely on heavily image segmentation and feature extraction algorithms, and they cannot directly recognize disease type from the original diseased leaf image. From Tables 1 and 2, we conclude that (1) GPDCNN is more robust than other methods, and its recognition rate improves not so obvious on the augmented segmented lesion image dataset; (2) the traditional feature extraction based methods are apparently better on the segmented lesion dataset than those on the original diseased image dataset, and they do not need a lot of training samples because they apply SVM or K-nearest neighbor classiﬁer to classify images; (3) it is clearly demonstrates that GPDCNN based disease recognition method is able to directly work on the original diseased leaf images and simplify a lot of pre-processing steps for disease recognition. Fig. 4 indicates that DCNNs and AlexNet models converge slower than GPDCNN, because they apply the fully connected layers to concatenate the extracted feature maps into a feature vector which leads to large memory requirement. Moreover, their network structures are not optimal because only one convolution kernel is used in the same convolution layer. In GPDCNN, in Inception layer, the multi-scale convolutional kernels are used to extract the multi-scale features of the input image. The experimental

Conﬂict of interest The authors declare that the research was conducted in the absence of any commercial or ﬁnancial relationships that could be construed as a potential conﬂict of interest. Appendix A. Supplementary material Supplementary data to this article can be found online at https:// doi.org/10.1016/j.compag.2019.03.012. References Alex, K., Ilya, S., Geoﬀ, H., 2012. Imagenet classiﬁcation with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 25, 1106–1114. Al-Saﬀar, A.A.M., Tao, H., Talab, M.A., 2018. Review of deep convolution neural network in image classiﬁcation. In: IEEE International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications, pp. 26–31. Bashish, D.A., Braik, M., Baniahmad, S., 2011. Detection and classiﬁcation of leaf diseases using K-means-based segmentation and neural-networks-based classiﬁcation. Inform. Technol. J. 10 (2), 267–275. Dyrmann, M., Karstoft, H., Midtiby, H.S., 2016. Plant species classiﬁcation using deep convolutional neural network. Biosystems Engineering 151, 72–80. Fang, Yi, Ramasamy, Ramaraja P., 2015. Current and prospective methods for plant disease detection. Biosensors 5 (3), 537–561. Fuentes, A., Yoon, S., Sang, C.K., et al., 2017. A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition. Sensors 17, 2022. https://doi. org/10.3390/s17092022. Garcia, A.B.J., 2013. Digital image processing techniques for detecting, quantifying and classifying plant diseases. Springerplus 2 (1), 660. Gulhane, M.V.A., Gurjar, A.A., 2011. Detection of diseases on cotton leaves and its possible diagnosis. Int. J. Image Process. 5 (5), 590–598. Hinton, G.E., Srivastava, N., Krizhevsky, A., et al., 2012. Improving neural networks by preventing co-adaptation of feature detectors. Comput. Sci. 3 (4), 212–223. Hu, J., Lu, J., Tan, Y.P., 2015. Deep transfer metric learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 325–333. Kamilaris, A., Prenafeta-Boldú, F.X., 2018. A review of the use of convolutional neural networks in agriculture. J. Agric. Sci. 1–11. Khairnar, K., Dagade, R., 2014. Disease detection and diagnosis on plant using image processing a review. Int. J. Comput. Appl. 108 (13), 36–38. Khan, M.A., Kim, Y.H., Choo, J., 2018. Intelligent fault detection via dilated convolutional neural networks. In: IEEE International Conference on Big Data and Smart Computing, Computer Society, pp. 729–731. Kudo, Y., Aoki, Y., 2017. Dilated convolutions for image classiﬁcation and object localization. In: IEEE 15th IAPR International Conference on Machine Vision Applications, pp. 452–455. Li, B., Fan, Z., Zhang, X., Huang, D., 2019. Robust dimensionality reduction via feature space to feature space distance metric learning. Neural Netw. https://doi.org/10. 1016/j.neunet.2019.01.001. Liu, X., Wang, L., Zhang, J., et al., 2017. Global and local structure preservation for

429

Computers and Electronics in Agriculture 162 (2019) 422–430

S. Zhang, et al.

dilated convolutions for handwritten text line segmentation. Int. J. Document Anal. Recogn. 21, 177–186. Shi, Y., Wang, X.F., Zhang, S.W., et al., 2015. PNN based crop disease recognition with leaf image features and meteorological data. Int. J. Agric. Biol. Eng. 8 (4), 60–68. Sladojevic, S., Arsenovic, M., Anderla, A., et al., 2016. Deep neural networks based recognition of plant diseases by leaf image classiﬁcation. Comput. Intell. Neurosci. 6, 1–11. Wang, H., Li, G., Ma, Z., et al., 2012. Image recognition of plant diseases based on principal component analysis and neural networks. In: IEEE International Conference on Natural Computation, pp. 246–251. Zhao, Z.Q., Zheng, P., Xu, S.T., et al., Object Detection with Deep Learning: A Review, 2018. Available from: arXiv:1807.05511. Zhao, Z.Q., Glotin, Hervé, Xie, Z., et al., 2012. Cooperative sparse representation in two opposite directions for semi-supervised image annotation. IEEE Trans. Image Process. A Publ. IEEE Signal Process. Soc. 21 (9), 4218–4231.

feature selection. IEEE Trans. Neural Netw. Learn. Syst. 25 (6), 1083–1095. Liu, B., Zhang, Y., He, D.J., et al., 2018. Identiﬁcation of apple leaf diseases based on deep convolutional neural networks. Symmetry 10 (1). https://doi.org/10.3390/ sym10010011. Martinelli, F., Scalenghe, R., Davino, S., et al., 2015. Advanced methods of plant disease detection. A review. Agronomy Sust. Dev. 35 (1), 1–25. Mccann, M.T., Jin, K.H., Unser, M., 2017. Convolutional neural networks for inverse problems in imaging: a review. IEEE Signal Process. Mag. 34 (6), 85–95. Mohanty, S.P., Hughes, D.P., Salathé, M., 2016. Using deep learning for image-based plant disease detection. Front. Plant Sci. 7. https://doi.org/10.3389/fpls.2016. 01419. Qin, F., Liu, D., Sun, B., et al., 2016. Identiﬁcation of alfalfa leaf diseases using image recognition technology. Plos One 11 (12), e0168274. Rawat, W., Wang, Z., 2017. Deep convolutional neural networks for image classiﬁcation: a comprehensive review. Neural Comput. 29 (9), 2352–2449. Renton, G., Soullard, Y., Chatelain, C., et al., 2018. Fully convolutional network with

430

Cucumber leaf disease identification with global pooling dilated convolutional neural network

Cucumber leaf disease identification with global pooling dilated convolutional neural network

Recommend Documents