A new proposal for automatic identification of multiple soybean diseases

A new proposal for automatic identification of multiple soybean diseases

Computers and Electronics in Agriculture xxx (xxxx) xxxx Contents lists available at ScienceDirect Computers and Electronics in Agriculture journal ...

5MB Sizes 3 Downloads 78 Views

Computers and Electronics in Agriculture xxx (xxxx) xxxx

Contents lists available at ScienceDirect

Computers and Electronics in Agriculture journal homepage: www.elsevier.com/locate/compag

A new proposal for automatic identification of multiple soybean diseases Juliana Mariana Macedo Araujo , Zelia Myriam Assis Peixoto ⁎

Graduate Program in Electrical Engineering, Pontifical Catholic University of Minas Gerais, Av. Itau, 525, Dom Cabral, CEP 30535-012 Belo Horizonte, MG, Brazil

ARTICLE INFO

ABSTRACT

Keywords: Soybean leaf disease Local binary pattern Support vector machine Color moments Bag of visual words model

This work proposes a combination of digital image processing techniques consisting of color moments, local binary patterns (LBP) and bag of visual words (BoVW) model for automatic detection of soybean diseases based on the analysis of color, texture and local characteristics of spots on affected leaves. The characteristics extracted after data collection and the application of the cited techniques are grouped and used as input to the support vector machine (SVM) for the purpose of disease classification. The images used in the development and validation of the proposed identification system were obtained from Digipathos, a database provided by the Brazilian Agricultural Research Agency (Embrapa), consisting of 354 images related to 8 typical soybean diseases, namely, bacterial blight, soybean rust, copper phytotoxicity, soybean mosaic, target spot, downy mildew, powdery mildew and Septoria brown spot. Performance analyses are presented separately for the techniques used in several stages of data processing and for the overall system consisting of the combination of these techniques and the SVM classifier. The proposed system achieves an success rate of 75.8%, representing an improvement of approximately 17% over results of methods in the existing literature.

1. Introduction Research related to detection of diseases in plantations has been increasingly assuming a key role in improving the performance and competitiveness of modern agribusiness. In Brazil, where soybean farming represents one of the main agricultural activities, the third survey of the 2018/19 crop released by the Brazilian Distribution Agency (CONAB, 2018) shows that soybean diseases are responsible for an estimated annual loss of 15% to 20% of the annual crop of approximately 120.1 million tons (Juhász et al., 2013). In 2009, there was a loss of 13 million tons out of a 100 million-ton crop in 23 states in the United States (Koenning et al., 2010). The identification of diseases in plantations usually involves the risk of subjective interpretation and distortions in judgment of specialized professionals, as such identification is performed manually and visually (Barbedo, 2016). Additionally, due to the difficulties that small and medium farmers experience in reaching such specialists, there are delays in identifying anomalies and taking preventive actions, which has a significant impact on productivity losses. In this context, several proposals have been brought forward, aiming at automatic identification of plant types and their respective diseases by means of image processing techniques that involve the stages of segmentation of regions of interest, extraction of characteristics and classification of diseases or types of plants, in addition to



methods for contrast enhancement and noise reduction. Among the various available techniques, color-based segmentation is frequently used to separate the region of interest from the background of the image; the simplest methods use the RGB (red-greenblue) color channels, usually relying on the green channel (Meyer and Neto, 2008; Kirk et al., 2009) or empirical mathematical equations (Meyer et al., 2004; Hunt et al., 2005; Weizheng et al., 2008; BurgosArtizzu et al., 2011; Guerrero et al., 2012). More complex segmentation methods use histograms, shapes, textures and spectral signature similarity such as the entropy of a histogram (Tellaeche et al., 2008), Otsu’s method (Kurniawati et al., 2009;Macedo-Cruz et al., 2011), fuzzy thresholding (Macedo-Cruz et al., 2011) and clustering methods such as fuzzy C-means (SekulskaNalewajko and Goclawski, 2011), K-means (Sannakki et al., 2011; Zhang, 2017) and K-median (Qin, 2016). Extraction of characteristics from images of leaves and diseases can be performed outside the visible light spectrum, as done by hyperspectral methods (Rumpf et al., 2010; Barbedo et al., 2015; Xie et al., 2017), or within the visible light spectrum. The color characteristics can be obtained using the RGB channel color range (Pixia and Xiangdong, 2013), color moments (Kadir et al., 2013) and histograms (Barbedo et al., 2016). To extract the characteristics of textures, the polar Fourier transform (PFT) (Kadir et al., 2013), Gabor wavelet transform (GWT) (Prasad et al., 2016), gray-level co-occurrence matrix (GLCM) (Pixia

Corresponding author. E-mail address: [email protected] (J.M.M. Araujo).

https://doi.org/10.1016/j.compag.2019.105060 Received 6 March 2019; Received in revised form 13 October 2019; Accepted 17 October 2019 0168-1699/ © 2019 Elsevier B.V. All rights reserved.

Please cite this article as: Juliana Mariana Macedo Araujo and Zelia Myriam Assis Peixoto, Computers and Electronics in Agriculture, https://doi.org/10.1016/j.compag.2019.105060

Computers and Electronics in Agriculture xxx (xxxx) xxxx

J.M.M. Araujo and Z.M.A. Peixoto

and Xiangdong, 2013; Prasad et al., 2016) and local binary patterns (LBP) (Pantazi et al., 2019) are used. Local and shape characteristics can be extracted by fractal dimension (Kadir et al., 2013), morphological operations (Pixia and Xiangdong, 2013), and bag of words with scale-invariant feature transform (SIFT) (Fiel and Sablatnig, 2010), among others. After the characteristics of interest have been extracted, it is necessary to classify the leaf or the disease of the leaf according to the objectives of the analysis. Classification can be performed by using correlation-based techniques (Barbedo et al., 2016), the shortest distance (Pixia and Xiangdong, 2013), the K-nearest neighbors algorithm (Prasad et al., 2016; Xie et al., 2017), feature ranking (FR) (Xie et al., 2017), the sparse representation method (SR) (Zhang, 2017), a probabilistic neural network (PNN) (Kadir et al., 2013) or one-class support classifier vector machines (OCSVM) (Pantazi et al., 2019). The support vector machine (SVM) is frequently used, as seen in (Camargo and Smith, 2009;Rumpf et al., 2010; Fiel and Sablatnig, 2010;Jian and Wei, 2010; Rumpf et al., 2010;Qin, 2016). Based on the various analyzed studies, this work aims to identify eight soybean diseases: bacterial blight, soybean rust, copper phytotoxicity, soybean mosaic, target spot, downy mildew, powdery mildew and Septoria brown spot. Seeking improvements in the identification of diseases from soybean leaf images, this paper will propose a combination of techniques involving color analysis, texture characteristics and local aspects such as curves, edges and color variations. In this work, a given soybean leaf is first segmented in relation to the background and the region affected by the disease using the Kmeans method. Then, the color moments’ equations are used to extract color features, LBP is used to extract texture features and the speeded up robust features algorithm (SURF) is applied to obtain local features that will provide inputs for building the bag of visual words (BoVW) model. These feature extractors constitute the inputs to the SVM that is responsible for classification of diseases present in the soybean leaf being analyzed. The project was implemented in the MATLAB® (version R2015a) environment. The images of soybean leaves used in the project were obtained from the Digipathos dataset provided by the Brazilian Agricultural Research Agency (Embrapa). To assess the interrelationships of techniques considered in this work, confusion matrices corresponding to color moments, BoVW and LBP as well as to the global system were calculated. For validation, the Student’s t-distribution-based confidence intervals of results obtained by the proposed system were compared to results presented by Barbedo (2016); note that both approaches were developed using the Digipathos dataset images. The final results obtained by the proposed classification method were also compared to the results in Camargo and Smith (2009) and Phadikar et al. (2013), based on studies and results presented by Barbedo (2016) in relation to these proposals that also used images of the Digipathos dataset. The performed tests and comparative analyses demonstrated a superior performance of the proposed method.

Table 1 Allocation of images to training and test stages. N

Disease

1 2 3 4 5 6 7 8

Bacterial blight Soybean rust Copper phytotoxicity Soybean mosaic Target spot Downy mildew Powdery mildew Septoria brown spot Total

Training Stage

Test Stage

Total

36 46 16 15 43 22 55 15 248

15 19 7 7 19 10 23 6 106

51 65 23 22 62 32 78 21 354

• Extraction of color, texture and local features, and • Classification. 2.1. Segmentation and augmentation of data The original image is defined in the L*a*b color space (where L is lightness, and a*b represents the green-red and blue-yellow color components, respectively), which is an evenly distributed color system. The K-means algorithm (MacQueen, 1967) allows the leaf to be segmented into 2 clusters, one corresponding to the leaf region and the other to the background, to extract the checkered background on which the Digipathos dataset images are available, as shown in Fig. 1. Next, the system performs image segmentation of the damaged areas of soybean leaves. As shown in Fig. 2, parts of the background still remain in the image after leaf segmentation using the K-means algorithm. Thus, to improve the segmentation, edge detection is applied to the image, which results in the leaf outline over the background. The outline of a larger area is then used as a mask of the original image for the leaf cut, as shown in Fig. 3. Segmentation of diseased parts of the leaf is performed on the previously segmented image, as shown in Fig. 3 in the HSL color space (where H is hue, S is saturation, and L is lightness). Considering variations of S [0, 1] and H [0°, 360°], pixels with intensity S < 0.05 are removed, as these are the gray pixels remaining from the background after leaf segmentation. Then, the green areas, which are considered to be healthy, are removed from the leaf image, excluding pixels with H [60°, 330°]. As a result of these operations, only the damaged areas of the leaf will remain in the image, as shown in Fig. 4. Data augmentation techniques are used to increase the number of samples by a factor of 8. The operations performed comprise cutting, rotating and changing the range of images of the affected area of the leaf. Initially, the image of that leaf region is cut into 4 images of equal dimensions, which will comprise the first 4 data augmentation images. Then, cuts are performed again, resulting in 4 additional images, to which rotation or range change operations are applied, with the angle of rotation ranging from 0° to 360° and the range change varying from

2. Materials and methods The images of soybean leaves affected by the diseases were obtained from the Digipathos dataset provided by the Brazilian Agricultural Research Agency (Embrapa), available at https://www.digipathos-rep. cnptia.embrapa.br/. Digipathos consists of 354 images of 8 different soybean diseases. For training, approximately 70% of images were used, and the rest were set aside for validation tests of the developed system. The number of images used in the training and test phases for each disease is listed on Table 1 together with the respective disease identification number N adopted in this work. The methodology used in this work comprises the following steps:

• Segmentation and augmentation of data,

Fig. 1. Original image. 2

Computers and Electronics in Agriculture xxx (xxxx) xxxx

J.M.M. Araujo and Z.M.A. Peixoto

Fig. 2. Segmentation of a leaf using the K-means algorithm. Fig. 5. LBP system (do Amaral et al., 2013).

=

M i=1

N j=1

(Pij

MN

4

µ) 4 (4)

where M and N are the numbers of columns and rows of the image, and Pij is the value of the pixel in column i and row j, in each plane of the RGB system. Thus, three values are obtained from Eqs. (1)–(4), corresponding to the three planes of the RGB system and resulting in a vector composed of 12 values that characterize the image based on in its color. Fig. 3. Detection of contours of the leaf shown in Fig. 2 using the K-.means algorithm.

2.3. Texture feature extraction

0.5 to 1.5. The operations of rotation and change of range as well as their values are selected arbitrarily.

Texture feature extraction is performed on the damaged leaf areas in grayscale using the local binary patterns. LBP is a texture descriptor popularized by the work of Ojala et al. (2002), in which texture information is extracted by analyzing neighboring pixels. As shown in Fig. 5, LBP is based on the comparison of neighboring pixels of 3x3 windows considered in the original image, where a binary value is established for each pixel according to the central pixel of the window, i.e., that value will be 1 if the intensity of the analyzed pixel is greater than that of the central pixel and will be 0 otherwise, as indicated in the lower right matrix of Fig. 5. Then, the matrix cell values are concatenated with their neighbor’s binary codes, establishing a label for the central pixel being considered, and resulting in the lower left matrix of Fig. 5. After the above procedure has been performed for all pixels, the result is an matrix with values ranging from 0 to 255. Based on this matrix, the labels of each point in the matrix are used to generate a histogram that will be used for texture comparison.

2.2. Color feature extraction

2.4. Feature extraction of local characteristics

Color feature extraction is based on the color moments’ equations that are usually used to identify objects, assuming that color has a probabilistic distribution in the image. Accordingly, the calculation entails the mean ( µ ), standard deviation ( ), skewness ( ) and kurtosis ( ) in the damaged parts of the leaf for the R, G and B channels, determined by Eqs. (1)–(4), respectively (Stricker and Orengo, 1995):

The bag of visual words (BoVW) technique, used for the extraction of local features, is applied only to the image of the damaged area of the leaves, which is already converted to grayscale and reduced by 1 in 4 order to reduce the processing time. In the BoVW technique proposed by Sivic and Zisserman (2003), each feature becomes a visual word, and the set of such visual words constitutes a dictionary or bag. Based on the comparisons between the local features of an unknown object and the elements of the dictionary, a histogram is built to identify the similarities (Pedrosa, 2015). The BoVW dictionary is generated based on the detection and description of points of interest in the image by using the SURF algorithm for each image with the same training set label (Bay et al., 2008). SURF is applied to grayscale images in two main steps that entail the detection and description of points of interest. This method can be used regardless of the scale or rotation to which the images are subjected. Detection of points of interest consists of locating regions that can

Fig. 4. Segmentation of disease areas of a leaf.

µ=

=

=

1 MN

M

N

i=1

j=1

Pij

1 MN M i=1

(1)

M

N

i=1

j =1

N j=1

(Pij

MN

3

(Pij

µ) 2

(2)

µ )3 (3) 3

Computers and Electronics in Agriculture xxx (xxxx) xxxx

J.M.M. Araujo and Z.M.A. Peixoto

Fig. 8. Creation of the BoVW dictionary.

Fig. 6. Non-maximum suppression in a 3x3x3 neighborhood (Evans, 2009).

characterize the image of the object, such as curves, edges and color changes. The determinant of the Hessian matrix is calculated for each point in the image at different scales. After the determinants have been calculated, a verification of each point with its 8 neighbors on the same scale and with its upper- and lower-scale neighbors (non-maximum suppression in a 3x3x3 neighborhood) is performed, as shown in Fig. 6. Points that have greater variation in their determinant value relative to that of their neighbors are considered to be points of interest. Each of these points is then described in a fixed-size matrix with 64 values. This description is created using the Haar wavelet transform, which is represented by vectors in the horizontal and vertical directions, where the sum of all responses results in a vector that determines the orientation of the point of interest. A square with side length 20s is positioned over the point of interest to describe it according to the location and orientation previously calculated in the detection process. This square is divided into 16 squares of equal size, for which the Haar wavelet transform is calculated, and dx and dy values corresponding to the horizontal and vertical response, respectively, as well as the sum of absolute values |dx| and |dy| are determined, as shown in Fig. 7. Each division of the square is represented by a vector of size 4, resulting in a vector with 64 values representing the final description of each point of interest. Then, the point descriptions are grouped by the K-means method into k groups based on their similarities. The centroids obtained by Kmeans are visual words in the BoVW dictionary for the respective label. This procedure is repeated for each grouping of points, creating a bag for each label (disease), as shown in Fig. 8. After the dictionaries have been created, a histogram corresponding to the damaged area in each image is built based on the number of occurrences of words present in the dictionary. As shown in Fig. 9, the system detects and describes points of interest in the test image, groups those points based on closeness to visual words in the dictionary, and counts the points grouped for each visual word, resulting in a histogram that constitutes the final representation of the image according to the number of words in the dictionary.

Fig. 9. BoVW histogram output.

2.5. Classification of diseases Classification is performed using the support vector machine technique. SVM is based on the statistical learning theory developed by Vapnik (1995) based on studies presented in Vapnik and Chervonenkis (1971). SVM is a type of supervised learning, where the algorithm training is based on a set of data, the input and output parameters of the algorithm are predefined, and the algorithm generates a classifier capable of labeling data for new unknown inputs. SVM performs binary separation of data into two classes automatically. If data are linearly distributed in space, the separation function called the hyperplane is a straight line. However, when they are distributed nonlinearly, data are reproduced in a higher-dimensional space using the kernel functions ( ) proposed by Hofmann et al. (2008). The separation in this new sample space according to the SVM proposed by Vapnik and Chervonenkis (1971) is shown in Fig. 10. In this work, the kernel function K (x i ;xj ) is a polynomial specified by Eq. (5),

K (x i ;xj ) = (1 + xiT x j )d

(5)

where x i and x j are input data located at coordinates i , j , and d is the order of the polynomial. As identification of soybean diseases is a multiclass task, the ”oneagainst-one” method is used. In that method, classification is performed for pairs, i.e., for each class in relation to every other class. Each pair of scores will represent one vote for a disease, and the identification method is such that the disease with the most votes is selected as the leaf disease (Albuquerque, 2012). Therefore, to use SVM after disease feature extraction, the system performs a concatenation of output vectors of the color moments method and the histograms obtained by LBP and BoVW for each image. These vectors are used as inputs in the training phase and then during

Fig. 7. Haar wavelet transform calculation in the internal divisions of a point of interest (Evans, 2009).

Fig. 10. Sample space being redistributed by the kernel function, including delimitation by the separating hyperplane (Huson, 2007). 4

Computers and Electronics in Agriculture xxx (xxxx) xxxx

J.M.M. Araujo and Z.M.A. Peixoto

Table 2 Accuracy of the proposed methodology of disease analysis. N

Disease

Test 1

Test 2

Test 3

Test 4

1 2 3 4 5 6 7 8

Bacterial blight Soybean rust Copper phytotoxicity Soybean mosaic Target spot Downy mildew Powdery mildew Septoria brown spot Total

65.6% 89.7% 62.1% 58.5% 84.6% 80.5% 85.0% 24.0% 75.8%

54.1% 89.1% 70.7% 60.4% 85.9% 70.1% 79.1% 34.0% 73.3%

68.0% 90.4% 70.7% 58.5% 79.9% 77.9% 84.0% 38.0% 76.4%

64.8% 91.7% 62.1% 62.3% 80.5% 75.3% 84.5% 34.0% 75.6%

classifier with a polynomial kernel function was trained on 70% of images available in the Digipathos dataset that were selected randomly. The remaining 30% of images were used in accuracy tests of the proposed system. The accuracy, defined as the average accuracy rate for each disease and the complete system, was determined according to Eq. (6):

Accuracy (x ) =

Tpx Tpx + Tnx

(6)

where x is a disease or the complete system, and Tpx and Tnx denote the number of successes and errors, respectively, in identification of x. Since the images used in training and testing are selected randomly, by performing the image selection process several times, it is possible to assess the system’s behavior and efficiency for different sets of images. Table 2 shows the results obtained for the accuracy of the proposed methodology for diseases analyzed separately and for the complete system after 4 training and test stages were completed. Additionally, the table shows the outcome of disease identification and the total accuracy for each test group as determined by the ratio of the total number of disease images identified correctly and the total number of images tested. The confusion matrix (showing percentages), obtained using the separation of training and test images similar to that of Test 1 in Table 2, is shown for the system response for each disease in Table 3. To evaluate the impact of each technique on disease identification, the training and testing of the SVM was performed on the same dataset for each technique separately (Qin, 2016). The respective results can be represented by confusion matrices (Bravo et al., 2003; Ahmed et al., 2012; Barbedo, 2016) shown in Table 4 for the color moments method, Table 5 for the BoVW method and Table 6 for the LBP method. Each of these extractors obtained a total accuracy rate of 68.0% with color moments, 67.7% with BoVW and 66.9% with LBP. Table 7 presents a comparative analysis of the results of the proposed system and those presented by Barbedo et al. (2016), both obtained using the images available in the Digipathos dataset. The values of upper and lower limits in the proposed system’s response were calculated by using the confidence intervals of the Student’s t-distribution and an estimated 5% error in the values obtained in tests presented in Table 2 (Pereira et al., 2011). The system’s response was characterized by a higher success rate for the second disease, soybean rust, estimated to be 88.4–92.0%. As in Barbedo et al. (2016), the lowest success rate was obtained in the identification of the eighth disease, Septoria brown spot, where the study’s success rate was 15% compared to 23.0–42.0% obtained by the proposed method. These low success rates are probably related to similarities between the eighth disease, Septoria brown spot, and the fourth disease, soybean mosaic, as well as the fifth disease, target spot, in their initial stages, as shown in Figs. 12–14, respectively. In contrast to images of other diseases, approximately 50% of images for the eighth disease, Septoria brown spot, that are available in the Digipathos dataset exhibit features not yet well-defined by the evolution of the disease.

Fig. 11. Block diagram of the proposed automatic identification system.

disease identification by SVM. The operation diagram of system training is shown as a block diagram in Fig. 11 and comprises the steps of segmentation, feature extraction and SVM training. Training is performed using 70% of images for each disease that are selected randomly. The tests used to verify the system response are performed using the remaining images of each disease, which represent 30% of the total. The system test diagram is substantially the same as that of the training phase, shown as a block diagram in Fig. 11; however, during the test, the system does not create the BoVW dictionary and instead only extracts the response histograms of images. Additionally, its responses are used for disease identification instead of SVM training. 3. Results and discussion The proposed approach is designed to identify 8 soybean diseases: bacterial blight, soybean rust, copper phytotoxicity, soybean mosaic, target spot, downy mildew, powdery mildew and Septoria brown spot. After data augmentation, the total of 2832 images were used, out of which 1984 were reserved for training and 848 were used in testing. The methodology used for segmentation of affected leaves was proved to be efficient, resulting in the separation of the damaged area of each leaf, which was suitable for feature extraction, as shown in Fig. 4. After the images of damaged areas of the leaves were obtained, the technique of data augmentation was used. The color moments method was used to extract color features, LBP was used for texture features, and the BoVW method was used to extract local features. The SVM 5

Computers and Electronics in Agriculture xxx (xxxx) xxxx

J.M.M. Araujo and Z.M.A. Peixoto

Table 3 System confusion matrix.

Legend: 1. Bacterial blight; 2. Soybean rust; 3. Copper phytotoxicity; 4. Soybean mosaic; 5. Target spot; 6. Downy mildew; 7. Powdery mildew; 8. Septoria brown spot.

The results in Table 7 show clearly that the proposed identification system obtains better results than does the compared study Barbedo et al., 2016 except in identification of the third disease, copper phytotoxicity, for which the results can be considered equivalent since the result of Barbedo et al. (2016) is within the calculated confidence interval. Apart from the comparative analysis presented in Table 7, it is also possible to compare the performance of the proposed method with Camargo and Smith (2009) and Phadikar et al. (2013) according to the analyses performed by Barbedo et al. (2016) using the images from the Digipathos dataset. According to Barbedo et al. (2016), the methods proposed by Camargo and Smith (2009) and Phadikar et al. (2013) obtained accuracy values of 58% and 56%, respectively, in soybean disease classification. The analysis of accuracy of these studies that have similar objectives and use the same dataset proves that the proposed system yields the best response.

4. Conclusions In this paper, a new methodology was presented for identification of 8 soybean diseases, namely, bacterial blight, soybean rust, copper phytotoxicity, soybean mosaic, target spot, downy mildew, powdery mildew and Septoria brown spot. The proposed methodology consists of extraction of color features by using the color moments technique, extraction of the texture features by local binary patterns (LBP) and the extraction of local features through speeded up robust features (SURF) and bag of visual words (BoVW) algorithms. Classification is performed using the support vector machine (SVM). The proposed methodology identifies a single disease per leaf. If there are multiple diseases, the system will classify only one of them, namely, that with the greatest prominence in the leaf image. In addition, if a disease is observed that is not included in 8 diseases investigated in this study, the system’s output will indicate the disease that has been used for training and that is most similar to the new disease being considered.

Table 4 Color moments confusion matrix.

Legend: 1. Bacterial blight; 2. Soybean rust; 3. Copper phytotoxicity; 4. Soybean mosaic; 5. Target spot; 6. Downy mildew; 7. Powdery mildew; 8. Septoria brown spot.

6

Computers and Electronics in Agriculture xxx (xxxx) xxxx

J.M.M. Araujo and Z.M.A. Peixoto

Table 5 BoVW confusion matrix.

Legend: 1. Bacterial blight; 2. Soybean rust; 3. Copper phytotoxicity; 4. Soybean mosaic; 5. Target spot; 6. Downy mildew; 7. Powdery mildew; 8. Septoria brown spot.

The methods of feature extraction applied individually proved to be efficient in characterization of soybean diseases, with a significant increase in the system’s accuracy observed when such methods were combined with the unsupervised learning method of SVM. The system’s accuracy was approximately 73.1–77.5% with a confidence interval of 5%, with a higher success rate observed in identification of the soybean rust disease (88.4–92.0%) and a lower success rate obtained in identification of the Septoria brown spot disease (23.0–42.0%). Compared to the study of Barbedo et al. (2016), the proposed system of disease identification in soybean plantations obtained better results for 7 out of 8 analyzed diseases, while the third disease type, copper phytotoxicity, was the only disease for which the system’s results were similar to those of the cited study. Both approaches were applied to the same database provided by the Brazilian Agricultural Research Agency (Embrapa) and are available at https:// www.digipathos-rep.cnptia.embrapa.br/. Further research may extend the method proposed in this article to identification of different diseases and plants.

Table 7 Comparison between the proposed system Barbedo et al., 2016. N

Disease

1 2 3 4 5 6 7 8

Bacterial blight Soybean rust Copper phytotoxicity Soybean mosaic Target spot Downy mildew Powdery mildew Septoria brown spot Total

Proposed

Barbedo (2016)

53.3–72.9% 88.4–92.0% 58.5–74.3% 57.0–62.8% 78.0–87.5% 68.9–83.0% 78.8–87.5% 23.0–42.0% 73.1–77.5%

48.2% 69.2% 73.9% 54.5% 45.2% 45.2% 64.5% 15.0% 58.0%

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Table 6 LBP confusion matrix.

Legend: 1. Bacterial blight; 2. Soybean rust; 3. Copper phytotoxicity; 4. Soybean mosaic; 5. Target spot; 6. Downy mildew; 7. Powdery mildew; 8. Septoria brown spot.

7

Computers and Electronics in Agriculture xxx (xxxx) xxxx

J.M.M. Araujo and Z.M.A. Peixoto

wheat kernels using hyperspectral imaging. Biosyst. Eng. 131, 65–76. Barbedo, J.G.A., Koenigkan, L.V., Santos, T.T., 2016. Identifying multiple plant diseases using digital image processing. Biosyst. Eng. 147, 104–116. Bay, H., Tuytelaars, T., Gool, L.V.S., 2008. Speeded-up robust features (surf). Comput. Vis. Image Underst. 110, 346–359. Bravo, C., Moshou, D., West, J., McCartney, A., Ramon, H., 2003. Early disease detection in wheat fields using spectral reflectance. Biosyst. Eng. 84, 137–145. Burgos-Artizzu, X.P., Ribeiro, A., Guijarro, M., Pajares, G., 2011. Real-time image processing for crop/weed discrimination in maize fields. Comput. Electron. Agric. 75, 337–346. Camargo, A., Smith, J., 2009. Image pattern classification for the identification of disease causing agents in plants. Comput. Electron. Agric. 66, 121–125. CONAB, 2018. Acompanhamento da safra brasileira de grãos, vol 6 safra 2018/2019 - n.3 - terceiro levantamento, Companhia Nacional de Abastecimento. do Amaral, V., Giraldi, G.A., Thomaz, C.E., 2013. Lbp estatıstico aplicado ao reconhecimento de expressoes faciais. In: Proceedings of the X Encontro Nacional de Inteligencia Artificial e Computacional. Evans, C., 2009. Notes on the opensurf library. Fiel, S., Sablatnig, R., 2010. Automated identification of tree species from images of the bark, leaves or needles, na. Guerrero, J.M., Pajares, G., Montalvo, M., Romeo, J., Guijarro, M., 2012. Support vector machines for crop/weeds identification in maize fields. Expert Syst. Appl. 39, 11149–11155. Hofmann, T., Schölkopf, B., Smola, A.J., 2008. Kernel methods in machine learning. Ann. Stat. 1171–1220. Hunt, E.R., Cavigelli, M., Daughtry, C.S., Mcmurtrey, J.E., Walthall, C.L., 2005. Evaluation of digital photography from model aircraft for remote sensing of crop biomass and nitrogen status. Precision Agric. 6, 359–378. Huson, D., 2007 Algorithms in bioinformatics ii, SoSe’07: Center for Bioinformatics Tubingen (June 27, 2007). Jian, Z., Wei, Z., 2010. Support vector machine for recognition of cucumber leaf diseases 5, 264–266. Juhász, A.C.P., Pádua, G.d., Wruck, D.S.M., Favoreto, L., Ribeiro, N.R., 2013. Desafios fitossanitários para a produção de soja. Informe Agropecuário 34, 66–75. Kadir, A., Nugroho, L.E., Susanto, A., Santosa, P.I., 2013. Leaf classification using shape, color, and texture features, arXiv preprint arXiv:1401.4447. Kirk, K., Andersen, H.J., Thomsen, A.G., Jørgensen, J.R., Jørgensen, R.N., 2009. Estimation of leaf area index in cereal crops using red–green images. Biosyst. Eng. 104, 308–317. Koenning, S.R., Wrather, J.A., et al., 2010. Suppression of soybean yield potential in the continental united states by plant diseases from 2006 to 2009. Plant Health Progr. 10. Kurniawati, N.N., Abdullah, S.N.H.S., Abdullah, S., Abdullah, S., 2009. Investigation on image processing techniques for diagnosing paddy diseases 272–277. Macedo-Cruz, A., Pajares, G., Santos, M., Villegas-Romero, I., 2011. Digital image sensorbased assessment of the status of oat (avena sativa l.) crops after frost damage. Sensors 11, 6015–6036. MacQueen, J., 1967. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 1, 281–297. Meyer, G.E., Neto, J.C., 2008. Verification of color vegetation indices for automated crop imaging applications. Comput. Electron. Agric. 63, 282–293. Meyer, G.E., Neto, J.C., Jones, D.D., Hindman, T.W., 2004. Intensified fuzzy clusters for classifying plant, soil, and residue regions of interest from color images. Comput. Electron. Agric. 42, 161–180. Ojala, T., Pietikainen, M., Maenpaa, T., 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987. Pantazi, X., Moshou, D., Tamouridou, A., 2019. Automated leaf disease detection in different crop species through image features analysis and one class classifiers. Comput. Electron. Agric. 156, 96–104. Pedrosa, G.V., 2015. Caracterização e recuperação de imagens usando dicionários visuais semanticamente enriquecidos, Ph.D. thesis, Universidade de São Paulo, Instituto de Ciências Matemáticas e de Computação, Doutorado em Ciências de Computação e Matemática Computacional, São Carlos. Pereira, F.M.V., Milori, D.M.B.P., Pereira-Filho, E.R., Venâncio, A.L., Russo, M.D.S.T., do Brasil Cardinali, M.C., Martins, P.K., Freitas-Astúa, J., 2011. Laser-induced fluorescence imaging method to monitor citrus greening disease. Comput. Electron. Agric. 79, 90–93. Phadikar, S., Sil, J., Das, A.K., 2013. Rice diseases classification using feature selection and rule generation techniques. Comput. Electron. Agric. 90, 76–85. Pixia, D., Xiangdong, W., 2013. Recognition of greenhouse cucumber disease based on image processing technology. Open J. Appl. Sci. 3, 27. Prasad, S., Peddoju, S.K., Ghosh, D., 2016. Multi-resolution mobile vision system for plant leaf disease diagnosis. Signal Image Video Process. 10, 379–388. Qin, F.E.A., 2016. Identification of alfalfa leaf diseases using image recognition technology. PLoS ONE 11 Article ID e0168274. Rumpf, T., Mahlein, A.-K., Steiner, U., Oerke, E.-C., Dehne, H.-W., Plümer, L., 2010. Early detection and classification of plant diseases with support vector machines based on hyperspectral reflectance. Comput. Electron. Agric. 74, 91–99. Sannakki, S.S., Rajpurohit, V.S., Nargund, V., Kumar, A., Yallur, P.S., 2011. Leaf disease grading by machine vision and fuzzy logic. Int. J. 2, 1709–1716. Sekulska-Nalewajko, J., Goclawski, J., 2011. A semi-automatic method for the discrimination of diseased regions in detached leaf images using fuzzy c-means clustering 172–175. Sivic, J., Zisserman, A., 2003. Video google: A text retrieval approach to object matching in videos, 1470.

Fig. 12. Image of a soybean leaf with the Septoria brown spot disease.

Fig. 13. Image of a soybean leaf with the soybean mosaic disease.

Fig. 14. Image of a soybean leaf with the target spot disease.

Acknowledgments This study was financially supported in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) Finance Code 001. The authors would like to thank Jayme Garcia Arnal Barbedo and Embrapa for providing information about the image dataset and the dataset itself. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.compag.2019.105060. References Ahmed, F., Al-Mamun, H.A., Bari, A.H., Hossain, E., Kwan, P., 2012. Classification of crops and weeds from digital images: a support vector machine approach. Crop Protect. 40, 98–104. Albuquerque, R.W.d., 2012. Monitoramento da cobertura do solo no entorno de hidrelétricas utilizando o classificador SVM (Support Vector Machines), Ph.D. thesis, Universidade de São Paulo. Barbedo, J.G.A., 2016. A review on the main challenges in automatic plant disease identification based on visible range images. Biosyst. Eng. 144, 52–60. Barbedo, J.G., Tibola, C.S., Fernandes, J.M., 2015. Detecting fusarium head blight in

8

Computers and Electronics in Agriculture xxx (xxxx) xxxx

J.M.M. Araujo and Z.M.A. Peixoto Stricker, M.A., Orengo, M., 1995. Similarity of color images 2420, 381–393. Tellaeche, A., Burgos-Artizzu, X.P., Pajares, G., Ribeiro, A., 2008. A vision-based method for weeds identification through the bayesian decision theory. Pattern Recogn. 41, 521–530. Vapnik, V., 1995. The nature of statistical learning theory springer new york google scholar. Vapnik, V.N., Chervonenkis, A.Y., 1971. On the uniform convergence of relative frequencies of events to their probabilities. In: Theory of Probability and its

Applications, p. 283–305. Weizheng, S., Yachun, W., Zhanliang, C., Hongda, W., 2008. Grading method of leaf spot disease based on image processing, vol. 6, pp. 491–494. Xie, C., Yang, C., He, Y., 2017. Hyperspectral imaging for classification of healthy and gray mold diseased tomato leaves with different infection severities. Comput. Electron. Agric. 135, 154–162. Zhang, S.E.A., 2017. Leaf image based cucumber disease recognition using sparse representation classification. Comput. Electron. Agric. 134, 135–141.

9