Computers and Electronics in Agriculture 162 (2019) 422–430
Contents lists available at ScienceDirect
Computers and Electronics in Agriculture journal homepage: www.elsevier.com/locate/compag
Original papers
Cucumber leaf disease identification with global pooling dilated convolutional neural network
T
⁎
Shanwen Zhanga, Subing Zhangb, Chuanlei Zhangc, , Xianfeng Wanga, Yun Shia a
School of Information Engineering, Xijing University, Xi’an 710123, China China Electronics Standardization Institute, Beijing 100007, China c College of Computer Science and Information Engineering, Tianjin University of Science and Technology, Tianjin 300222, China b
A R T I C LE I N FO
A B S T R A C T
Keywords: Cucumber disease identification Convolutional neural network (CNN) Global pooling CNN Dilated convolutions Global pooling dilated CNN (GPDCNN)
It is a challenging research topic to identify plant disease based on diseased leaf image processing techniques due to the complexity of the diseased leaf images. Deep learning models are promising for identifying plant disease based on leaf images and AlexNet is one of these models. Aiming at the problems of too many parameters of the AlexNet model and single feature scale, a global pooling dilated convolutional neural network (GPDCNN) is proposed in this paper for plant disease identification by combining dilated convolution with global pooling. Compared with the classical convolutional neural network (CNN) and AlexNet models, GPDCNN has three improvements: (1) the convolution receptive field are increased without increasing the computational complexity and without losing the discriminant formation by replacing fully connected layers with a global pooling layer; (2) dilated convolutional layer is employed to recover the spatial resolution without increasing the number of training parameters; (3) GPDCNN also integrates the merits of dilated convolution and global pooling. Experimental results on the datasets of six common cucumber leaf diseases demonstrate that the proposed model can effectively recognize cucumber diseases.
1. Introduction Plant diseases are responsible for major economic losses in the agricultural production. Timely detecting and identifying plant diseases is essential to cure and control them. Various approaches have been presented for detecting and identifying plant diseases. Martinelli et al. (2015) described the modern methods based on nucleic acid and protein analysis, and reviewed the innovative approaches currently under development. Fang and Ramasamy (2015) reviewed the direct and indirect disease identification methods currently used in plant disease detection, such as enzyme-linked immune-sorbent assay, immunefluorescence, fluorescence in-situ hybridization, fluorescence imaging and hyperspectral techniques. They also provided a comprehensive overview of biosensors based on highly selective bio-recognition elements such as enzyme, antibody, DNA/RNA and bacteriophage as a new tool for the early plant disease identification. Although these methods are effective, they are difficult to implement for common farmers. Plant disease recognition based on the leaf lesion image is a challenging research topic in computer vision, image processing and precision agriculture, which can provide accurate, fast and efficient disease diagnosis. Many classical methods of plant disease recognition
⁎
focus on the feature extraction from the diseased leaf image. Gulhane and Gurjar (2011) proposed a cotton leaf disease recognition method. In the method, various features of leaf image are extracted from the color of actual infected image, and a back-propagation neural network (BPNN) is used to recognize the color diseased leaf image. Bashish et al. (2011) proposed a leaf disease detection and classification method based on K-means-based segmentation and neural-networks-based classification. The experiment results validated that the proposed detection model based neural networks is very effective in recognizing leaf diseases, whilst K-means clustering technique provides efficient results in segmentation RGB images. Wang et al. (2012) presented a plant disease recognition method, where 21 color features, 4 shape features and 25 texture features are extracted from two kinds of wheat and grape diseased leaf images, respectively, and principal component analysis (PCA) is utilized to reduce the dimensions of the extracted feature data, and several neural networks classifiers are used to identify the wheat and grape diseases. The results showed that these neural networks could be used for plant diseased leaf image recognition based on PCA. Image processing is best way for detecting and diagnosis plant leaf diseases. Garcia (2013) presented a survey on the digital image processing techniques for detecting, quantifying and classifying plant
Corresponding author. E-mail address:
[email protected] (C. Zhang).
https://doi.org/10.1016/j.compag.2019.03.012 Received 2 December 2018; Received in revised form 11 February 2019; Accepted 11 March 2019 Available online 30 April 2019 0168-1699/ © 2019 Published by Elsevier B.V.
Computers and Electronics in Agriculture 162 (2019) 422–430
S. Zhang, et al.
types of plant diseases with the ability to distinguish plant leaves from their surroundings. Fuentes et al. (2017) presented a deep learning based tomato disease and pest detection method. They considered three main families of detectors: i.e., region-based fully convolutional network (R-FCN), faster region-based CNN (Faster R-CNN) and single shot multi-box detector, and combined each of these meta-architectures with “deep feature extractors” such as VGG network and residual network (ResNet), and finally proposed a method for local and global class annotation and data augmentation to increase the accuracy and reduce the number of false positives during training. Liu et al. (2018) proposed an accurate identifying approach for apple leaf diseases based on DCNN. The experimental results on the four common apple leaf disease dataset indicated that DCNN can provide a better solution in plant disease identification with high accuracy and a faster convergence rate. In traditional CNN and DCNN, the ratio of parameters in all fully connected layers is almost 80% of the whole network, which will increase the training and testing time, and lead to a large demand for computer memory. Too many parameters could result in overfitting problem. Hinton et al. (2012) set dropout in the fully connected layer to effectively reduce the amount of parameters, avoid overfitting and make the model more robust, but the optimization of dropout parameters depends on human experience. Some steps are often adopted to improve the recognition performance of CNN and DCNN models, and the steps could be reducing the computational complexity by pooling operation, increasing the receptive field and expanding the image to the original size by deconvolution. However some discriminant information may lose and the information of the small object cannot be recovered and reconstructed. The global pooling is often used to replace the traditional fully connected layers in CNN to enforce the relationship between feature maps and categories, and overfitting is naturedly avoided at this layer, because there is no parameter needed to optimize in the global pooling (Liu et al., 2017). Dilated convolution can avoid pooling operation, enlarge the receptive field, and achieve good segmentation and recognition results (Kudo and Aoki, 2017; Renton et al., 2018). Fully Convolutional Networks (FCN) has shown compelling quality and efficiency for image classification. Each output pixel is a classifier corresponding to the receptive field and the networks can thus be trained pixel-to-pixel given the category-wise semantic segmentation annotation. It can retain the internal data structure of the image without reducing the image resolution and without increasing the number of parameters or the amount of computation (Khan et al., 2018). Many multi-scale feature extraction methods have been proposed using a pyramid of different rescaled versions of an original image as input to an improved convolution neural network. But these methods require extremely high computational costs, because of a huge amount of input parameters. Current CNN based plant diseased leaf image classification approaches also include multi-scale deep features selected from different layers of pooling and subsampling in a deep convolutional neural network, where the receptive field in an original image
diseased leaf digital images in the visible spectrum. The paper is useful to researchers working on both vegetable pathology and pattern recognition, providing a comprehensive and accessible overview in this important research field. Khairnar and Dagade (2014) reviewed a lot of plant disease detection and diagnosis methods, introduced several different feature extraction techniques for extracting various features from plant diseased leaf images, such as color histogram, scale-invariant feature transform (SIFT), Gabor filter, grey level co-occurrence matrix (CCM), Canny and Sobel edge detector, and then suggested several different classification classifiers such as Artificial Neural Network (ANN), Support Vector Machine (SVM), BPNN, Radial Basis Function Neural Network (RBFNN), Probabilistic Neural Network (PNN). Qin et al. (2016) investigated the identification and diagnosis methods of the four types of alfalfa leaf diseases using pattern recognition and image-processing algorithms, in which a sub-image with one or multiple typical lesions is obtained by artificial cutting from each acquired digital disease image, then the sub-images are segmented using twelve lesion segmentation algorithms including K-means clustering, K-median clustering, and fuzzy C-means clustering, and supervised classification methods such as linear discriminant analysis (LDA), Naive Bayes algorithm, logistic regression analysis, SVM, and regression tree. A lot of comparative experiments are conducted to validate that the study can provide a feasible solution for lesion image segmentation and recognition of alfalfa leaf diseases. Many dimensionality reduction and sparse representation algorithms have been applied to plant disease recognition field (Li et al., 2019; Zhao et al., 2012), but the recognition results are not ideal, due to the complex diseased plant leaf images. A main step in the above traditional methods is feature extraction and selection. However, it is difficult to extract and select the optimal features from diseased leaf images for disease recognition due to the diseased leaf images are often very complex and irregular, as shown in Fig. 1. Furthermore, plant disease leaves visibly show a variety of shapes, forms, colors, etc. and it is not easy to extract optimal robust multi-resolution high-level features from various plant disease leaves. So these traditional methods cannot guarantee high recognition rates of plant leaf diseases. In recent years, convolutional neural networks (CNNs) and their modified models have acquired significant attention in image recognition and classification (Zhao et al., 2018; Mccann et al., 2017; Kamilaris and Prenafeta-Boldú, 2018). Different from the traditional methods, CNN can learn high-level robust features directly from the original image instead of extracting the specific features manually. It has been widely applied to various image classification and achieved impressive results (Al-Saffar et al., 2018; Rawat and Wang, 2017). In plant species and plant disease recognition, it is demonstrated that CNNs can provide better performance than the traditional feature extraction methods (Dyrmann et al., 2016; Mohanty et al., 2016). Sladojevic et al. (2016) proposed a plant leaf disease recognition method based on deep CNN (DCNN) model, and fully described all essential steps required for plant disease recognition. The method can effectively recognize 13 different
Fig. 1. Examples of cucumber diseased leaves. 423
Computers and Electronics in Agriculture 162 (2019) 422–430
S. Zhang, et al.
resolution by dilating the filter before computing the usual convolution. The size of the convolutional filter is expanded, and the empty positions are filled completely with 0, obtaining the height and width of the convolution kernel to represent the original convolution kernel. Dilated convolutional layer has been validated to be effective in image segmentation tasks (Kudo and Aoki, 2017; Khan et al., 2018). It is a good pooling layer to alternate the pooling and convolutional operator by sparse kernels. A 2D dilated convolutional layer is defined as follow,
can be expanded to better cover global features. However, these methods lead to a reduction in the resolution and a loss of details and local features in an image. Inspired by current plant CNN based disease identification approaches, we hope to find a better solution for combining the advantages of multi-resolution images and multi-scale feature descriptors to extract both global and local information in an image without losing resolution. CNN is able to automatically and hierarchically extract the latent feature for pattern recognition. It is a very promising candidate for a practical and scalable approach to solve the various image classification tasks. CNN and its variants have been applied to plant disease recognition and the obtained results are promising. However, CNN and its modified models may not be efficient since there are a huge number of parameters needed to be trained. Global Pooling deep CNN (GPDCNN) can automatically learn features from the diseased leaf images for disease recognition. Compared with the deep CNN (DCNN) based crop disease recognition methods, GPDCNN can overcome the problem of spending a long time on training a large number of parameters of the network. Recent works showed that dilated convolutions can give good performance in image classification and machine translation (Zhao et al., 2018; Al-Saffar et al., 2018; Liu et al., 2017). Traditional neural networks apply pooling or convolution with 2 or more stride to decreasing the feature map resolution and expanding the receptive field. Dilated convolution supports exponential expansion of the receptive field without loss of feature map resolution since it applies convolution with a dilation factor instead of convolution after decreasing of the feature map resolution (Kudo and Aoki, 2017; Renton et al., 2018). Integrating the merits of the dilated convolution and global pooling, we construct a novel global pooling dilated convolutional neural network (GPDCNN) for cucumber disease recognition, including two major stages: global pooling CNN as the front-end for 2D feature extraction and a dilated CNN for the back-end by dilated kernels to deliver larger reception fields and to replace pooling operations. Comparing with the existing AlexNet model, GPDCNN has three improvements: (1) increasing the receptive field of the first convolutional layer by means of dilated convolution; (2) replacing the fully connected layer by global pooling layer to reduce the network parameters; (3) increasing the diversity of features by means of multi-scale feature fusion. The main contributions of this paper are summarized as follows:
M
y (m , n ) =
N
∑ ∑ x (m + r × i, n + r × j) w (i, j) i=1 j=1
(1)
where y (m , n) is the output of dilated convolution of the input x (m , n) , w (i, j ) is a filter with size M × N, r is the dilation rate. If r = 1, the dilated convolution is degenerated into a normal convolution. In dilated convolution, a small-size kernel with sizek × k is expanded to k + (k − 1)(r − 1) with the dilated stride r. Thus it allows flexible accumulation of the multi-scale contextual information while keeping the same resolution. In common convolutional layer with size 3 × 3, its receptive field is 3 × 3, while in dilated convolution with size 3 × 3, its receptive field is 7 × 7 when r = 2, and 15 × 15 when r = 3. In general, the receptive field is (2i + 1 − 1) × (2i + 1 − 1) when r = i . That is to say, the systematic dilation satisfies exponential expansion of the receptive field without loss of resolution or coverage (Kudo and Aoki, 2017).
2.2. Global pooling CNNs takes the convolutional layers as a feature extractor to extract the high-level feature maps, and inputs these feature maps into the fully connected layers to stretch a long feature vector, and then is fed into Softmax classifier. The shortcoming is that the fully connected layer has too many parameters, which reduce the speed of training and easily result in overfitting. A global pooling layer is used to replace all of the fully connected layers in CNNs on the top of the feature maps. In global pooling layer, each extracted feature map of the last convolutional layer generates a feature point, and all the points are constructed into a vector, then the vector is fed directly into the Softmax layer for each corresponding category of the classification task. One advantage of global pooling layer over the fully connected layers is that the correspondences between the extracted feature maps and categories are enforced. Another advantage is that there is no parameter to optimize in the global pooling layer, thus overfitting is naturally avoided at this layer. Furthermore, the global pooling sums up the spatial information, so the constructed feature vector is more robust to spatial translation of the input images. Suppose there are 10 6 × 6 feature maps in the last convolutional layer. Global average (or maximum) pooling is to calculate the average (or maximum) of each of the 10 feature maps, thus 10 feature maps will output 10 feature points. We concatenate these data points into a 1 × 10 feature vector, and input it to Softmax for image classification. In plant leaf disease recognition, using global average pooling (GAP) is better than fully connected operator. GAP can achieve dimension reduction and parameter reduction, and enhance the generalization ability. It can overcome overfitting without optimizing the dropout parameters, because there is no parameter needed to optimize in the GAP layer. Let x ijk be the k-th feature map with size m × n in the last convloutional layer, GAP is performed as follows,
(1) The convolution kernel in the AlexNet model is replaced by dilated convolution, which can enlarge the local receptive field and enhance the feature extraction ability of the convolution layer. (2) Using global pooling layer can effectively reduce the number of training parameters and avoid overfitting to a certain extent. (3) The multi-scale convolutional kernels are used to extract the multiscale features of the input image. After fusing these features, the recognition accuracy and the robustness of the model can be improved. The rest of this paper is organized as follows. Section 2 introduces dilated convolution and global pooling. In Section 3, GPDCNN is described in detail. Section 4 analyzes the experimental results provided by the GPDCNN based cucumber leaf disease recognition approach. Finally, this paper is concluded in Section 5. 2. Related works 2.1. Dilated convolution In classical CNN, pooling layers are widely used to maintain invariance and avoid overfitting, but they dramatically reduce the spatial resolution of the input images, thus the spatial information of the extracted feature maps will be lost. Although deconvolutional layers can alleviate the information loss, the additional complexity may not be suitable for all cases. Dilated convolution can recover the spatial
k yGAP =
1 mn
m−1 n−1
∑ ∑ xijk i=0 j=0
(2)
k where yGAP is output of the GAP layer. Thus, for a given class c, the input to the Softmax classifier Sc is
424
Computers and Electronics in Agriculture 162 (2019) 422–430
S. Zhang, et al.
Fig. 2. The structure of GPDCNN. N
Sc =
k ∑ wck yGAP k=1
multi-scale feature maps are extracted, including 128 pooling feature maps and three kinds of convolutional feature maps. Step 3, the multi-scale convolution layer Inception includes three convolution layers and one pool layer. The size of convolution layer and pool layer is 1 × 1. The dimension of the characteristic graph is 96 × 96, 16 × 16, 64 × 64 and 32 × 32. Step 4, the multi-scale feature map obtained by step 3 is integrated, and Concat is used to make the network model learn more detailed features of the lesions, and the network model with dilated convolution in complex background has a higher recognition rate. Step 5, Concat in step 4 consists of three convolution layers. The dimensions of the convolution layers are 1 × 1, 3 × 3 and 5 × 5, respectively. The dimension of the feature graph obtained is 128 × 128. Step 6, the fusion feature map obtained by step 5 is fed to Conv5, and the global pooling layer is used to reduce the number of parameters and avoid overfitting. Step 7, the dimension of convolution layer Conv5 in step 6 is 3 × 3, and the dimension of characteristic graph is 448 × 448. The global pooling layer includes one convolution layer and one batch normalization layer, the dimension of convolution layer is 3 × 3, and the output dimension is 128 × 128. Step 8, input the feature map obtained in step 7 into the Softmax classifier to classify plant diseases.
(3)
where wck is the weight corresponding to class c. Specially, wck indicates k the importance of yGAP for class c. 3. Global pooling dilated convolutional neural network Making use of the advantages of the dilated convolution and global pooling, a global pooling dilated convolutional neural network (GPDCNN) model is proposed for plant leaf disease recognition. 3.1. GPDCNN structure GPDCNN is a modified CNN based on AlexNet model. Its structure is described in Fig. 2, including 13 layers, i.e., five convolutional layers Conv1-Conv5, four pooling layers Pooling1-Pooling4, a global pooling layer Global Pooling, a multi-scale convolution layer Inception, a feature fusion layer Concat, a Softmax classifier, where Inception has three convolutional layers and a pooling layer; Concat has three convolutional layers. Different from AlexNet, in GPDCNN, the original convolution kernel in Conv1 is replaced by a dilated convolution kernel, Inception and Concat are added behind Pooling4, and a global pooling layer instead of two fully connected layers is added after Conv5, and followed by a Softmax classifier. Inception layer, ConCat layer and Softmax classifier are used to activate the feature maps of each convolution layer. Nonlinear activation function is used to reduce the training time of GPDCNN, and to some extent, it can restrain the overfitting problem. Each convolution layer is followed by a nonlinear activation function (RELU). The number of the output channels of each layer, the size of the convolution kernel and the number of the extracted maps are denoted in Fig. 2, respectively. The convolutional kernel numbers of Conv1, Conv2, …, Conv5 are 96, 128, 192, 192 and 128, respectively. The kernel size of Conv1 is 7 × 7 dpi, the kernel size of Conv2 is 5 × 5 dpi, the kernel sizes of Conv3, Conv4, Conv5 and global pooling layer are all 3 × 3 dpi; the pooling sizes of pooling1-pooling4 are all 2 × 2 dpi with stride 2 in each layer blocks; four kernel sizes of Inception are all 1 × 1 dpi, and three kernel sizes of Concat are 1 × 1 dpi, 3 × 3 dpi and 5 × 5 dpi, respectively.
We use a straightforward way to train GPDCNN as an end-to-end structure. The first 4 convolutional layers are fine-tuned from a welltrained AlexNet (http://caffe.berkeleyvision.org/tutorial/layers.html). For other layers, the initial values come from a Gaussian initialization with standard deviation 0.01. We apply stochastic gradient descent (SGD) to train GPDCNN with fixed learning rate during training. The Euclidean distance is used to measure the difference between the ground truth and the estimated density map. The loss function is given as follow:
L (Θ) =
1 2N
N
∑ ∥Z (Xk ; Θ) − ZkGT ∥22 k=1
(4)
where N is the size of training batch and Z (Xk ; Θ) is the output generated by GPDCNN with parameter set Θ. Xk is the input image while ZkGT is the truth result of the input image Xk. Our work aims to identify cucumber diseases that affect cucumber plants using GPDCNN as the main body of the cucumber disease recognition system. A general overview of the system is presented in Fig. 3.
3.2. Process of GPDCNN A large number of training samples are used to train the GPDCNN. Step 1, extract the feature maps from the original training samples through the first four layers, including conv1, pooling 1, conv2, pooling 2, conv3, pooling 3, conv4 and pooling 4, thus obtain 192 basic feature maps. Step 2, in Inception layer, the extracted feature maps are processed by three convolutional operators and one pooling operator, and
4. Experiments and results To validate the performance of GPDCNN, we conduct a lot of experiments on a real-world cucumber diseased leaf image dataset and its corresponding lesion image dataset, and compare with four crop 425
Computers and Electronics in Agriculture 162 (2019) 422–430
S. Zhang, et al.
two typical deep learning models often used in image classification. All experiments are implemented on a PC computer with Ubuntu 16.04 system, memory size 16 GB, Intel@ Core™ i7-7 700 KCPU@ 4.00 GHz X8 processor, GPU of Nvida GTX1080Ti and 16 nm fabrication, GDDR5, capacity of 11 GB and a core frequency of 1480–1582 MHz, Caffe framework, an open source convolutional architecture for fast feature embedding exploited by the Berkeley Vision and Learning Center (BVLC), and Tensorflow in Python 3.4.
Image dataset Classification Result Data augmentation Testing set Annotated and augmented data
Training set
Trained GPDCNN Training
GPDCNN
Fig. 3. The cucumber disease recognition system.
4.1. Data collection disease recognition methods, i.e., probabilistic neural networks (PNNs) (Khan et al., 2018), sparse representation classification (SRC) (Shi et al., 2015), Deep CNNs (DCNNs) (Liu et al., 2018), and AlexNet (Alex et al., 2012). PNNs and SRC are two traditional crop disease recognition methods, in which it is required to artificially extract discriminative features which greatly rely on prior knowledge. DCNNs and AlexNet are
A BM-500GE/BB-500GE digital color camera with resolutions of 2456 × 2058 pixels was used to capture crop diseased leaf images. From the cucumber planting bases of Yangling agricultural high-tech industrial demonstration area, Shaanxi, China, 600 cucumber diseased leaves of 6 common cucumber leaf diseases and 100 normal leaves were collected, 100 leaf images per disease with typical disease symptoms
Fig. 4. Examples of cucumber diseased leaf images and corresponding to segmented lesion images. 426
Computers and Electronics in Agriculture 162 (2019) 422–430
S. Zhang, et al.
Fig. 5. Examples of augmentation images of a diseased leaf and its corresponding lesion.
Fig. 6. The recognition rates versus the iterations.
article/details/53816010?utm_source=copy) (Hu et al., 2015), where each original image is radially blurred by rotation blur and scaling blur respectively, the rotating fuzzy unit is 10, and the scaling fuzzy unit is 30; hue, saturation and brightness of each original image are increased by 20%, the contrast is increased by 30%, the sharpness is decreased by 10%; a 3 × 3 transformation matrix is required in perspective transformation; 30% Gauss noise is added to the original image with an offset of 0.2 and a standard deviation of 0.3, and PCA (Principal Component Analysis) jittering is used to disturb the natural image. After image augmenting, each image is produced 50 images. Thus, we obtain 600 × 50 = 30,000 diseased leaf images and 30,000 corresponding segmented lesion images, and 5000 normal leaf images. Finally, two general datasets, i.e., a diseased leaf image dataset containing all normal leaf images and a lesion image dataset, are constructed. Two datasets are able to simulate the natural environment of image acquisition, and provides an important guarantee of generalization capability of CNNs. Some augmentation examples are shown in Fig. 5. This annotation process aims to the disease class label of the lesion areas in the diseased leaf image. Starting with the dataset of images, we manually annotate the areas of every image containing the lesion with a bounding box and class. Some lesions might look similar depending on the infection status, so the knowledge for classifying the type of the
under several different conditions depending on the time (e.g., illumination), season (e.g., temperature and humidity), and place where they were taken (e.g., farmland and greenhouse). Six kinds of diseases are downy mildew, anthracnose, gray mold, angular leaf spot, black spot, and powdery mildew. To reduce the workload of the image analysis and focus on the regions of interest, we cropped each original diseased leaf image by artificial cutting. The cropped sub-image has typical lesions with same size 240 × 240. The K-means clustering algorithm is used to segment diseased leaf images (Fang and Ramasamy, 2015; Gulhane and Gurjar, 2011; Bashish et al., 2011; Wang et al., 2012). All segmented lesion images are only utilized to illustrate the highlight of the proposed method. Some original diseased leaf images and corresponding segmented lesion images are shown in Fig. 4. An effective CNNs model relies on a lot of iterative training on a large-scale image set. However, the amount of our dataset is too small to overcome overfitting of network. To produce sufficient diseased leaf images and increase the diversity of the dataset, the natural cucumber diseased leaf images are first acquired and then processed using data augmentation techniques, including geometrical transformations (random shift, random resize, random crop, random rotation/reflection, horizontal/vertical flipping) and intensity transformations (adjusting contrast and brightness enhancement, color jittering, noise addition, PCA jittering, radial blur) (https://blog.csdn.net/mwa2016/ 427
Computers and Electronics in Agriculture 162 (2019) 422–430
S. Zhang, et al.
(a) Original cucumber leaf image of Angular Leaf Spot
(b) Conv1
(h) Conv3
(c) Pooling1
(i) Pooling3
(d) ReLU1
(e) Conv2
(f) Pooling2
(g) ReLU2
(j) ReLU3
(k) Conv4
(l) Pooling4
(m) ReLU4
(n) Conv5
(o) Global average pooling
Fig. 7. Output images in different layers after 1000 iterations.
Batch size is set to 64 in training, and 50 in testing. Momentum is set to 0.9 without accelerated gradient. The Gaussian distribution with a mean of 0 and a standard deviation of 0.01 is used to initialize the weights of the network randomly. SGD with a mini-batch size of 256 is used to update the parameters of network by minimizing the total loss function in Eq. (3). The initial learning rate is set at 0.01, which is gradually reduced to 1/10 of the original when the epochs is 100 or 150 and the models are trained for up to 200 epochs., the regularization coefficient is set at 0.0005 and the coefficient of expansion is set to 2 in Conv1. Global average pooling (GAP) is able to reduce the error caused by increasing estimation variance due to the size of neighborhood constraints, and retain the background information of the image as much as possible, which is conducive to extracting key features, while the global maximum pooling (GMP) can preserve more low-level features but ignoring the high-level features. We utilize GAP for cucumber disease recognition. Five-fold-cross-validation (FFCV) is used to validate the performance of the proposed method. In FFCV, all image samples of the dataset are randomly split into 5 equal-sized subsets. In each time, a single subset is used as test set for test the network, whereas the remaining 4 subsets are used as training set to train the network. The cross-validation experiments are implemented 5 times, with each of the 5 subsets used exactly once as the test set. Thus, 5 results are averaged to produce a single estimation a FFCV experiment. Fig. 6 is the recognition rates versus the iterations in training on the diseased leaf dataset. From Fig. 6, it is seen that the average recognition rates of two common DCNNs and AlexNet models are around 87.65%, which is significantly lower than the average recognition rate of GPDCNN 90%. Moreover, GPDCNN begins to converge when the iterations reached
Table 1 Cucumber disease recognition rate by DCNNs, AlexNet and GPDCNN on the diseased leaf image dataset. Model
Pooling type
Training time (h)
Testing time (s)
Accuracy
DCNNs AlexNet GPDCNN
Fully-connected Fully-connected GAP
14.5 21.3 6.2
3.64 3.72 3.58
91.73 92.48 94.65
Table 2 Cucumber disease recognition rates (%) by GLSVD, IPT and GPDCNN on different datasets. Methods
GLSVD IPT GPDCNN
Datasets Diseased leaf dataset
Segmented lesion dataset
Original
Augmented
Original
Augmented
62.36 59.71 78.32
62.48 60.24 94.65
88.73 87.82 81.56
89.63 89.16 95.18
diseased leaf is provided by the experts in the area. The annotation outputs are the coordinates of the bounding boxes of different sizes with their corresponding disease class, which consequently will be evaluated with the predicted results of the network during testing. 4.2. Experimental setup GPDCNN is trained by mini-batch SGD with momentum factor. 428
Computers and Electronics in Agriculture 162 (2019) 422–430
S. Zhang, et al.
results and convergence process show that GPDCNN has higher recognition rate and learning rate.
about 60,000 times. But DCNNs and AlexNet need more than 90,000 iterations to achieve the stable recognition result. The training model is implemented on the training set, after that the recognition is performed on the test set, and when the experiments seem to achieve the expected results, the final recognition result is obtained on the testing set. From Fig. 6, we find that three models achieve the stable results after 17,000 iterations. So for simplicity, we perform GPDCNN, DCNNs and AlexNet 180,000 iterations on the diseased leaf dataset. Then the trained models are applied to identify the test set samples. To validate the advantages of GPDCNN in automatic feature extraction of image, Fig. 7 shows the output images in different layers after 10,000 iterations without finetuning, where all original diseased leaf images are used as training samples. Fig. 7 shows that the feature map is across an image by applying convolution, pooling and ReLU operators, and different layers can extract different features, where each feature map is displayed in different blocks and finally the GAP layer can learn much highlight features.
5. Conclusions In this paper, a novel deep learning architecture called GPDCNN for cucumber disease recognition is proposed. The dilated convolutional and global pooling layers are used to aggregate the multiscale contextual information and speed up convergence, and improve the recognition rate. With the dilated convolutional layers, GPDCNN can extend the receptive field without losing resolution. We demonstrated the model on cucumber diseased leaf image datasets with the state-ofthe-art performance. In future developments, we intend to improve the performance of GPDCNN by exploring the key role of probabilistic graphical models and further enhance our method in the field of crop disease recognition system. Acknowledgments
4.3. Experimental results This work is partially supported by the China National Natural Science Foundation under Grant No. 61473237, Key Research and Development Plan of Shaanxi No. 2017ZDXM-NY-088 and the Key Project of Tianjin Natural Science Foundation No. 18JCZDJC32100.
To reduce the influence of random effects, we repeat FFCV experiment 50 times and calculate the average results as the final experimental result. The experiment results of DCNNs, AlexNet and the proposed GPDCNN method are given in Table 1. From Table 1, it can be seen that GPDCNN outperforms DCNNs and AlexNet in terms of recognition accuracy and training time. The reason is that GPDCNN adopts GAP instead of fully-connection to accelerate the training process, and uses dilated convolution and multi-scale convolutional kernel to improve the recognition rate. Some reasons may be that, in DCNNs and AlexNet structures, a lot of calculation and convergence time is taken to evaluate its large number of weight parameters, due to the use of the fully-connected layer, and their ability to extract multi-resolution features depends on the resolution of the feature maps. In order to indicate the advantage of GPDCNN, we perform the disease recognition experiments by the traditional methods, GLSVD and IPT, on the original diseased leaf image dataset and the segmented spot images dataset. The comparison results are given in Table 2. From Table 2, we can see that the proposed method achieves much better recognition rate than the traditional approaches on the augmented diseased leaf image dataset, and only acquires less improvements from the segmented lesion image dataset, while the recognition rates of GLSVD and IPT increase greatly on the segmented lesion image dataset. The reasons are that GPDCNN needs a lot of training samples to train and can automatically learn the high-level abstract features from the original images, while the recognition accuracies of GLSVD and IPT rely on heavily image segmentation and feature extraction algorithms, and they cannot directly recognize disease type from the original diseased leaf image. From Tables 1 and 2, we conclude that (1) GPDCNN is more robust than other methods, and its recognition rate improves not so obvious on the augmented segmented lesion image dataset; (2) the traditional feature extraction based methods are apparently better on the segmented lesion dataset than those on the original diseased image dataset, and they do not need a lot of training samples because they apply SVM or K-nearest neighbor classifier to classify images; (3) it is clearly demonstrates that GPDCNN based disease recognition method is able to directly work on the original diseased leaf images and simplify a lot of pre-processing steps for disease recognition. Fig. 4 indicates that DCNNs and AlexNet models converge slower than GPDCNN, because they apply the fully connected layers to concatenate the extracted feature maps into a feature vector which leads to large memory requirement. Moreover, their network structures are not optimal because only one convolution kernel is used in the same convolution layer. In GPDCNN, in Inception layer, the multi-scale convolutional kernels are used to extract the multi-scale features of the input image. The experimental
Conflict of interest The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Appendix A. Supplementary material Supplementary data to this article can be found online at https:// doi.org/10.1016/j.compag.2019.03.012. References Alex, K., Ilya, S., Geoff, H., 2012. Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 25, 1106–1114. Al-Saffar, A.A.M., Tao, H., Talab, M.A., 2018. Review of deep convolution neural network in image classification. In: IEEE International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications, pp. 26–31. Bashish, D.A., Braik, M., Baniahmad, S., 2011. Detection and classification of leaf diseases using K-means-based segmentation and neural-networks-based classification. Inform. Technol. J. 10 (2), 267–275. Dyrmann, M., Karstoft, H., Midtiby, H.S., 2016. Plant species classification using deep convolutional neural network. Biosystems Engineering 151, 72–80. Fang, Yi, Ramasamy, Ramaraja P., 2015. Current and prospective methods for plant disease detection. Biosensors 5 (3), 537–561. Fuentes, A., Yoon, S., Sang, C.K., et al., 2017. A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition. Sensors 17, 2022. https://doi. org/10.3390/s17092022. Garcia, A.B.J., 2013. Digital image processing techniques for detecting, quantifying and classifying plant diseases. Springerplus 2 (1), 660. Gulhane, M.V.A., Gurjar, A.A., 2011. Detection of diseases on cotton leaves and its possible diagnosis. Int. J. Image Process. 5 (5), 590–598. Hinton, G.E., Srivastava, N., Krizhevsky, A., et al., 2012. Improving neural networks by preventing co-adaptation of feature detectors. Comput. Sci. 3 (4), 212–223. Hu, J., Lu, J., Tan, Y.P., 2015. Deep transfer metric learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 325–333. Kamilaris, A., Prenafeta-Boldú, F.X., 2018. A review of the use of convolutional neural networks in agriculture. J. Agric. Sci. 1–11. Khairnar, K., Dagade, R., 2014. Disease detection and diagnosis on plant using image processing a review. Int. J. Comput. Appl. 108 (13), 36–38. Khan, M.A., Kim, Y.H., Choo, J., 2018. Intelligent fault detection via dilated convolutional neural networks. In: IEEE International Conference on Big Data and Smart Computing, Computer Society, pp. 729–731. Kudo, Y., Aoki, Y., 2017. Dilated convolutions for image classification and object localization. In: IEEE 15th IAPR International Conference on Machine Vision Applications, pp. 452–455. Li, B., Fan, Z., Zhang, X., Huang, D., 2019. Robust dimensionality reduction via feature space to feature space distance metric learning. Neural Netw. https://doi.org/10. 1016/j.neunet.2019.01.001. Liu, X., Wang, L., Zhang, J., et al., 2017. Global and local structure preservation for
429
Computers and Electronics in Agriculture 162 (2019) 422–430
S. Zhang, et al.
dilated convolutions for handwritten text line segmentation. Int. J. Document Anal. Recogn. 21, 177–186. Shi, Y., Wang, X.F., Zhang, S.W., et al., 2015. PNN based crop disease recognition with leaf image features and meteorological data. Int. J. Agric. Biol. Eng. 8 (4), 60–68. Sladojevic, S., Arsenovic, M., Anderla, A., et al., 2016. Deep neural networks based recognition of plant diseases by leaf image classification. Comput. Intell. Neurosci. 6, 1–11. Wang, H., Li, G., Ma, Z., et al., 2012. Image recognition of plant diseases based on principal component analysis and neural networks. In: IEEE International Conference on Natural Computation, pp. 246–251. Zhao, Z.Q., Zheng, P., Xu, S.T., et al., Object Detection with Deep Learning: A Review, 2018. Available from: arXiv:1807.05511. Zhao, Z.Q., Glotin, Hervé, Xie, Z., et al., 2012. Cooperative sparse representation in two opposite directions for semi-supervised image annotation. IEEE Trans. Image Process. A Publ. IEEE Signal Process. Soc. 21 (9), 4218–4231.
feature selection. IEEE Trans. Neural Netw. Learn. Syst. 25 (6), 1083–1095. Liu, B., Zhang, Y., He, D.J., et al., 2018. Identification of apple leaf diseases based on deep convolutional neural networks. Symmetry 10 (1). https://doi.org/10.3390/ sym10010011. Martinelli, F., Scalenghe, R., Davino, S., et al., 2015. Advanced methods of plant disease detection. A review. Agronomy Sust. Dev. 35 (1), 1–25. Mccann, M.T., Jin, K.H., Unser, M., 2017. Convolutional neural networks for inverse problems in imaging: a review. IEEE Signal Process. Mag. 34 (6), 85–95. Mohanty, S.P., Hughes, D.P., Salathé, M., 2016. Using deep learning for image-based plant disease detection. Front. Plant Sci. 7. https://doi.org/10.3389/fpls.2016. 01419. Qin, F., Liu, D., Sun, B., et al., 2016. Identification of alfalfa leaf diseases using image recognition technology. Plos One 11 (12), e0168274. Rawat, W., Wang, Z., 2017. Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29 (9), 2352–2449. Renton, G., Soullard, Y., Chatelain, C., et al., 2018. Fully convolutional network with
430