Computers and Electronics in Agriculture 106 (2014) 102–110
Contents lists available at ScienceDirect
Computers and Electronics in Agriculture journal homepage: www.elsevier.com/locate/compag
Automatic threshold method and optimal wavelength selection for insect-damaged vegetable soybean detection using hyperspectral images Yanan Ma, Min Huang ⇑, Bao Yang, Qibing Zhu Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), Jiangnan University, Wuxi 214122, China
a r t i c l e
i n f o
Article history: Received 21 December 2013 Received in revised form 24 May 2014 Accepted 27 May 2014
Keywords: Insect Hyperspectral image Automatic threshold segmentation Wavelength selection Support vector data description
a b s t r a c t Insects in vegetable soybean undermine the quality and safety of soybean products. Thus, a nondestructive technique of detecting insect-damaged vegetable soybean must be developed. An efficient detection method based on a hyperspectral image was proposed by selecting the region of interest (ROI) through automatic threshold segmentation and optimal wavelength selection using the fuzzyrough set model. For the 362 samples of beans, three image features (i.e., entropy, energy, and mean) of the ROI were extracted as classification features, whose spectral region covered 400–1000 nm and contained 94 wavelengths. Three or less optimal wavelengths were then selected using a fuzzy-rough set model based on the thermal charge algorithm (FRSTCA). Support vector data description (SVDD) was used to develop classification models for the insect-damaged soybean. For the prediction samples of the beans, the classification results indicated that the normal samples were 100.0% correctly classified using the automatic extracting ROI method based on automatic threshold segmentation. The classification accuracy for the insect-damaged samples was 91.7%, and a 98.8% overall classification accuracy was achieved with the FRSTCA selecting two wavelengths. Ó 2014 Elsevier B.V. All rights reserved.
1. Introduction The vegetable soybean known as green soybean is delicious in taste and nutritious (Hou et al., 2011) to be welcomed by people. In the process of growth, the vegetable soybean would be affected by insects, and the insect-damage (e.g., pod borer) of soybean is difficult to control. However, with the spread of organic practice and concern of environmental protection, fewer chemical pesticides are used, which increases the insects found potential in vegetables and fruits. Insects in vegetable soybean products cause potential harm to consumers, so countries around the world proposed strict requirements against the number of quarantine insects for imported legumes. For example, America import and export trade standards require the number of pod borer not more than 1 for each 26 pounds, and the length of pod borer is less than 0.7 cm in legume agricultural products. The stringent requirements for the occurrence of insects in vegetable soybean impose great pressure on soybean industry and impede the export of vegetable ⇑ Corresponding author. Address: School of Internet of Things, Jiangnan University, 1800 Lihu Avenue, Wuxi, Jiangsu Province 214122, China. Tel.: +86 510 85910635, mobile: 86 15861596626. E-mail address:
[email protected] (M. Huang). http://dx.doi.org/10.1016/j.compag.2014.05.014 0168-1699/Ó 2014 Elsevier B.V. All rights reserved.
soybean. The main insects to soybean include pod borer, whitefly, aphids, black cutworm, cutworm and other 13 kinds of insects (Liu et al., 2011), among which pod borer is a worldwide pest to vegetable soybean. The life cycle of pod borer consists of egg, larva, and moth. The larva of pod borer is a serious damage source to the vegetable soybean because the larva is born into the pod, and feeds on the bean, which results in shriveled pod, empty pod and also damage the petiole and stem (Okeyoowuor et al., 1991). Although the internally damaged soybeans cannot be easily distinguished from the normal ones by their external appearances, identifying them during processing is indeed beneficial to both consumers and industries. Excluding these damaged soybeans can diminish the possibility of having infested soybeans in the final products, which is forbidden by food safety regulations. An efficient detection method is needed to detect insects in the vegetable soybean considering the quality and safety of soybean products. A number of techniques have been studied to detect internal pests in agricultural products, such as the sound method (Hagstrum et al., 1990), the Microwave radar method (Mankin, 2004), conductivity technology (Pearson and Brabec, 2002), near-infrared spectroscopy technology (Dowell et al., 2002, 2010), X-ray technology (Melvin et al., 2003), and machine vision technology (Zayas and Flinn, 1998). However, these methods are
Y. Ma et al. / Computers and Electronics in Agriculture 106 (2014) 102–110
either destructive or complex and have difficulty in detecting larva or dead insects. Compared with the methods above, the hyperspectral imaging technique provides a noninvasive and accurate food and agricultural product inspection system because it gives more information about the sample to be detected. Such information includes the internal structure characteristics, chemical composition, the morphology information. The applications of hyperspectral imaging technology in fruit and agricultural products detection include the following: the measurement of the optical properties of fruits and vegetables (Qin and Lu, 2008), the classification of internally damaged fruits or vegetables (Nakariyakul and Casasent, 2011; Lü and Tang, 2012; Lorente et al., 2012), the detection of the internal quality of fruits (Huang and Lu, 2010), and insect detection (Singh et al., 2009; Bhuvaneswari et al., 2011; Zhao et al., 2012). From the work of Huang et al. (2013) who adopted hyperspectral transmittance images to detect insect damage in vegetable soybeans, hyperspectral imaging technology was concluded to be feasible for detecting insects inside vegetable soybean. Given that the insect feeds on the bean, the position of the bean should be considered in the region of interest (ROI) to study the change of the chemical composition, the organization structure of the soybean, and detect the insect. When detecting insects in vegetable soybean, the typical approach for selecting a ROI of vegetable soybeans relies on a manual method that controls selection in software (Huang et al., 2013). However, the manual method in selecting this ROI requires a large amount of time and a complex format conversion, speed and the human disturbance factors could undermine the efficiency and accuracy of this method. Besides, Huang et al. (2013) used the full wavelengths to develop a classification model for the insect-damaged soybean detection, which it is not suitable for a real-time assessment because of the time-consuming computation in the image processing. As a follow-up study, the automatic threshold segmentation based on the iteration method and an optimal wavelength selection method were investigated to provide a new approach for the real-time and online detection of hyperspectral images. The following is a description of the overall approach of insectdamaged vegetable soybean detection: (a) vegetable soybean hyperspectral images within the spectral range of 400–1000 nm are acquired. (b) Automatic threshold segmentation based on the iteration method for image segmentation is adopted to extract the ROI of the soybean. (c) The fuzzy-rough set model based on the thermal charge algorithm (FRSTCA) is applied to select the optimal wavelengths. (d) SVDD is used to develop the classification model for insect-damaged vegetable soybean detection.
Fig. 1. Vegetable soybean beans representing two quality grades: (a) normal and (b) insect-damaged.
2.2. Hyperspectral transmittance image acquisition A hyperspectral transmittance imaging system is schematically shown in Fig. 2. An in-house, line-scan hyperspectral imaging system was used to acquire hyperspectral transmittance images from vegetable soybean pods. This system consisted of a hyperspectral imaging unit, a sample handling unit, and a DC-regulated light source. The hyperspectral imaging unit was made up of a
2. Materials and methods 2.1. Vegetable soybean samples In this experiment, vegetable soybean samples were acquired from the garden of Haitong Food Company in Cixi, Zhejiang Province in 2013. They were then sorted, washed, blanched. Before each experiment, these samples were kept at room temperature (24 °C) for approximately four hours to ensure that the samples were completely thawed. Each sample was assigned with a number. A normal sample is defined as intact on the surface, without insects in the internal area, have intact beans, and set to level 1. Samples representing two quality grades are shown in Fig. 1. An insect-damaged sample is defined as having no poles outside, but with insects in the internal area, or internal contain insects’ excrement, and set to level-1. The levels of 1 and -1 were used as labels in classifications.
103
Fig. 2. Schematic of the hyperspectral transmittance imaging system.
104
Y. Ma et al. / Computers and Electronics in Agriculture 106 (2014) 102–110
high-performance back-illuminated CCD camera and its control unit (pixelfly QE IC*285AL, Cooke, USA), an imaging spectrograph (1003A-10140 HyperspcTM VNIR C-Series, Headwall Photonics Inc., USA), a zoom lens (10004A-21226 Lens, F/1.4 FL 23 mm, Standard Barrel, C-Mount., USA), and a computer to command the camera and acquire the images. The imaging spectrograph with a 25 lm slit covers an effective range of 400–1000 nm, a 1.29 nm/pixel spectral resolution, and a 0.15 mm/pixel spatial resolution. The spectral interval of the hyperspectral transmittance imaging system is 0.64 nm. The sample handling unit was a horizontal motorized stage that was 2 mm thick, measuring 100 mm 100 mm, and made from 92% transmission sodalime glass (ROCOES Electro-Optics CO., Ltd. Taiwan). The transmittance light source system was made up of a 150 W DC light source (halogen lamp, 3250 K, Techniquip, USA) and a single optic fiber coupled with a 300 lightline (9135-HT) and a 300 diffuser to evenly deliver 90 mm 20 mm light to the sample. To obtain the whole undistorted image of one hyperspectral transmittance image in this work, the parameters were set to 30 mm scan length of the longitudinal, exposure time of 180 ms, the resulting hyperspectral transmission images had 6.4 nm spectral interval after setting spectral compression ratio (binning) to 10 and horizontal step size for each hyperspectral image was 60 lm. Finally, a special block of a 1392 500 94 image was created. During the experiment, greater noise was observed because of the uneven distribution of light intensity and dark-current. The hyperspectral images of the glass and darkness were also obtained for every 10 samples by blocking the entrance of the camera. These images were used to calculate the relative images of the samples. After the acquisition of the hyperspectral images, insects in the samples were detected through hand-peeling tests. 3. Data analysis 3.1. Hyperspectral image calibration Each image was acquired in the spectral range of 400–1000 nm with a total 94 wavelengths. glass and dark transmittance images were captured to correct the acquired image (TA) of a vegetable soybean sample. The dark reference image (TD) was obtained by turning off the light source and completely closing the lens of the camera with its opaque cap, whereas the glass transmittance image (TG) was acquired for a sodalime glass. The relative transmission (TR) was then calculated using the following equation:
TR ¼
TA TD TG TD
ð1Þ
All of the following analyses were carried out on the relative transmission images TR. 3.2. ROI selection The automatic threshold segmentation based on the iteration method can effectively reduce the error rate of distinguishing the target pixel from background and make the adjacent pixels of image away from the sensitive area of noise, resulting in a better immunity to the noise of images and a better segmentation result. Based on these advantages, automatic threshold segmentation based on the iteration method has been applied to many fields (Mendoza et al., 2012; Miao et al., 2013). In this study, the method was investigated to select the ROI and improve the efficiency and accuracy of insect-damaged soybean detection.
A vegetable soybean can be divided into three parts: background, pod and beans, and beans in interested positions. Automatic thresholding first separates the soybean objects from the background, and then extracts the ROI of beans. The algorithm process is described in detail as follows: (1) Each hyperspectral transmittance image had 94 wavelengths. The transmission image at 750 nm was adopted for this experiment because it had the best contrast between the soybean and the background and can be segmented easily by setting a simple threshold value. For this image, the 3 3 median filter was utilized to effectively remove the noises. That is setting the medium gray value for all the pixels in a neighborhood of the window to the gray value of the pixel point, and letting the surrounding pixel values close to the real value, thus eliminating isolated noises. The contrast enhancement using IMADJUST function of Matlab is required to show a greater difference in the background and the soybean. (2) The iterative method was adopted to obtain the optimal threshold to convert an intensity image to a binary image using IM2BW function of Matlab DIP toolbox. The algorithm process is as follows: (1) a starting threshold value such as the middle of the maximum and minimum gray values is set; (2) the image is segmented into the background and object regions utilizing the starting threshold value; (3) the sample mean of the gray values associated with the object region pixels and the sample mean of the gray values associated with the background pixels are computed. A new threshold value is now computed as the average of these two sample means; and (4) steps 2 and 3 are repeated based on the new threshold, until the threshold value becomes constant. (3) For the binary images obtained in (2), the contrast enhancement was required once again. To make pods contour smoother, and the closing, opening and expansion operations were adopted to remove tiny slit. First created structural elements as discs with the radius of 20 and 10 to achieve morphological closing operation and opening operation of the binary image respectively, and then IMDILATE function of Matlab was used to achieve the expansion operation with the structural element created as discs with the radius of 10. After that the regions of binary image were marked using BWLABEL function of Matlab. In order to achieve better implement of region segmentation, the element number of each region were calculated using AREA function. The largest image area which has the maximum element number was chosen as background using FIND function of Matlab, and the coordinates of background were found. Then these coordinates’ gray values were assigned the value 255, and the rest of the gray values remained the same. At this time the whole soybean was extracted. Binarized the image again using the automatic threshold segmentation method in order to separate pods and beans of the soybeans. At this time structural elements were created as discs with the radius of 40 to achieve morphological closing operation to remove tiny slit and smooth the binary image. Finally, the ROI of the soybean is then obtained. (4) The coordinates of ROI are determined from (3) and projected to other wavelengths. The ROI of the hyperspectral transmittance image under all wavelengths were then obtained.
105
Y. Ma et al. / Computers and Electronics in Agriculture 106 (2014) 102–110
At last, the images of 94 wavelengths for 362 beans were extracted. 3.3. Image feature extraction Differences in the spectra were derived from the physical structure and chemical composition features of the soybean based on their transmission spectral characteristics. Therefore, three image features (i.e., entropy, power, and mean) are introduced. 3.3.1. Image feature of entropy For the extracted ROI image of beans, assume f ðk; i; jÞ is the density function of pixel (i, j) at the kth wavelength, where k = 1, 2, . . . K, i = 1, 2, . . . M, j = 1, 2, . . . N, and K = 94 is the total number of wavelengths, M and N are the horizontal and vertical pixel number of the CCD cameras, respectively. The following is the detailed process of extracting the entropy feature at kth wavelength. (1) The sum of the transmittance intensity at the kth wavelength is calculated as:
SumðkÞ ¼
M X N X f ðk; i; jÞ
ð2Þ
i¼1 j¼1
(1) Let U = [u1, ui, ut, uN]T e RNS be the feature attribute set composed of S wavelengths of N samples. During the wavelength selection in this paper, feature attribute set were standardized to interval [0, 1] to decrease the impact of the result of inconsistent attributes before computing the neighborhood. And let D = [d1, di, dt, dN]T e RN1 be the decision attribute set of N samples. ui = [ui1, uij, uis] is the feature vector of the ith sample, di is the decision attribute value of the ith sample; di = 1 represents the insect-damaged sample, and di = 0 represents the normal sample. B = 1NN is the initial matrix, and its full matrix element is equal to 1. (2) The feature attribute relation matrix rjd of the jth wavelength and the decision attribute relation matrix r Dd are defined as:
rjd ¼ ðrjit ÞNN ;
(2) The probability distributions of each pixel at the kth wavelength is as follows:
pðk; i; jÞ ¼ f ðk; i; jÞ=SumðkÞ
In order to improve the scientificity and rationality of the evaluation method to the varieties in regional test, the fuzzy mathematics method was combined to establish an improved fuzzy comprehensive assessment method which is based on thermal charge coefficient. The data reduction was established based on the attribute importance measure matrix, whose coefficient was reflected by thermal charge. A FRSTCA can be formulated as follows (Hu et al., 2008):
ð3Þ
r jit ¼
r Dit ¼
3.3.2. Image feature of energy The energy value at the kth wavelength is computed as follows:
EðkÞ ¼
M X N X 2 f ðk; i; jÞ
ð5Þ
i¼1 j¼1
3.3.3. Image feature of mean The mean at the kth wavelength is calculated as:
MeanðkÞ ¼
M X N X f ðk; i; jÞ=M N
ð8Þ 0
juij uti j > d
1 juij uti j=d
else
ð4Þ
i¼1 j¼1
0
jdi dt j > d
1 jdi dt j=d
else
¼
N X
1=Nlog2
ð6Þ Eðr Dd Þ ¼
N X i¼1
Wavelengths selection is an important issue because high dimensionality brings great difficulty in classification systems. In order to reduce the computational complexity and improving the system efficiency of real-time application, the present study proposes an optimal wavelength selection method that uses a FRSTCA (Shen and Jensen, 2004; Hu et al., 2006) to reduce numerical and categorical features by assigning different thresholds for different kinds of attributes. Rough set (RS) theory can be adopted as a tool to discover data dependencies and decrease the number of attributes contained in a data set utilizing the data alone and with no additional information (Pawlak, 1991). However, it depends on the data set. Some important information may be lost as a result of the quantization of the underlying numerical features. In this paper, a FRSTCA was introduced to deal with the problem of information loss from the original data for the hyperspectral images of vegetable soybeans.
ð10Þ
N X r jit =N
!
ð11Þ
t¼1
i¼1
3.4. Wavelength selection utilizing FRSTCA
ð9Þ
where d is the initial positive value called the radius of the neighborhood. rjit and r Dit reflect the neighborhood relationship information of sample ui and ut under the jth wavelength and the decision attribute D, respectively. (3) The fuzzy thermal charge of feature attribute relation matrix r jd and the fuzzy thermal charge of decision attribute relation matrix rDd are defined as:
Eðr jd Þ
i¼1 j¼1
ð7Þ
rDd ¼ ðrDit ÞNN
(3) The entropy value at the kth wavelength is defined as: M X N X HðkÞ ¼ pðk; i:jÞlog2 pðk; i; jÞ
j ¼ 1; . . . ; S
1=Nlog2
N X r Dit =N
! ð12Þ
t¼1
(4) The importance of each wavelength is computed as follows:
h i h i Id ðr jd Þ ¼ E minðr jd ; BÞ E min minðr jd ; r Dd Þ; B
ð13Þ
where min(,) is the minimum value of the corresponding element of two matrices. Id ðrjd Þ is the importance of feature attribute of the jth wavelength. The larger the value of Id ðr jd Þ, the more important for decision attribute the corresponding wavelength is. The wavelength j whose Id ðr jd Þ is the biggest is selected, and recorded as j1. Let B ¼ r d1 , and repeat step (4) for selection of the rest of the wavelength. (5) Assuming h wavelengths are selected, namely j1, j2, . . .jh1, jh. jh1 When condition of jId ðrjh Þj < 0:001 is satisfied, d Þ Id ðr d selection is stopped. At last, the final optimal wavelengths selected are j1, j2, . . .jh1, jh. From the above discussion, we can see that the parameter called the radius of the neighborhood d (Hu et al., 2008) plays an important role in neighborhood rough sets. The selected wavelengths varied with the value of d, and resulted in different classification
106
Y. Ma et al. / Computers and Electronics in Agriculture 106 (2014) 102–110
performance. When d steadily increased from a very small value (nearly equal to zero), the number of selected wavelength was increased firstly, and then decreased. Preliminary study suggested that a good classification performance can be got if d takes values in [0.05, 0.5]. To yielded the best classification performance, the value of d in [0.05, 0.5] with increment of 0.01 was used to obtain the different set of selected wavelengths. Then, the partial least squares discriminant analysis (PLSDA) model for insect-damaged soybean classification was developed using the features from the set of selected wavelengths as model inputs. At last, the final wavelengths selected was achieved based on classification error of leave one out cross-validation. 3.5. SVDD algorithm Considering that the samples were screened through the company, insect-damaged samples were thus difficult to obtain. In this study, only 12 insect-damaged samples were found and the rest were all intact. In connection with the numbers of inequality for the samples, Tax and Duin (1999) proposed SVDD based on and connected with support vector machines (SVMs). SVDD is capable of providing accurate descriptions of a data set via the use of kernels. The principle of the SVDD (Williams, 2003; Tax and Duin, 2004; Sakla et al., 2011) algorithm is as follows: SVDD develops a data class by fitting a hypersphere with center a and radius R around all or most of the samples. Assume a set of calibration samples X = {xh, h = 1, . . .x}. The SVDD algorithm aims to minimize the volume of the hypersphere by minimizing R2. Introducing constant G controls the tradeoff between the volume of the hypersphere and the number of target objects rejected. The present work adopted the well-known Gaussian radial basis function (RBF) as the kernel function. The RBF kernel is given by the following:
Kðxh ; yÞ ¼ expðkxh yk2 =s2 Þ
ð14Þ
where s is a free parameter adjusted to control the tightness of the boundary. To test whether a new object y lies in the description, the distance from the center of the sphere to y must be less than R2. Hence, y is deemed as belonging to the class. As shown in the formula, utilizing SVDD with the RBF kernel requires the selection of the free parameters G and s. G and s are typically optimized through leave-one-out cross validation to select the best value for a given target and scene (Huang et al., 2013). A set of s values between 0.01 and 10 in increments of 0.01 is constructed, and the value of G varies between 0.1 and 1 in increments of 0.1. Utilizing the Kennard–Stone algorithm, 280 samples were selected from 350 normal samples for the calibration set. The remaining 70 normal samples and 12 insect-damaged samples comprised the prediction set (Krawczyk and Woz´niak, 2014; Tiwari et al., 2013). The above description provides the details and steps of utilizing the automatic threshold method and optimal wavelength selection for insect damage in vegetable soybean based on hyperspectral image. The specific processes and methods are shown in Fig. 3. All of the program codes, including the automatic threshold method, FRSTCA and SVDD method, were wrote and run in the Matlab platform (R2009b, Mathworks Inc., Natick, MA, USA). 4. Results and discussion 4.1. ROI selection based on automatic threshold method Figs. 4 and 5 are the similar gray levels of the samples of soybean and the background with different segmentation methods.
Among them, Figs. 4(a) and 5(a) are relative transmission images, the system generate a darker background compared to soybean, this is because the scan form is push-broom and using the line source to provide the light. In order to avoid the light intensity saturation caused by the optical line irradiate the CCD directly, the light source will be tilt a certain angle. When the object not move to the CCD camera, the camera cannot receive the light, and then the image which captured by camera is black. When the object moves to the CCD camera the light applies to the surface of object, which will work with the chemical composition and physical properties of object and produce absorption, reflection or transmission. Since the camera receives the light, and shoots the image with light, the system generates a darker background compared to soybean. Compared with the manual selection of ROI (Huang et al., 2013) and the specific threshold segmentation method, the proposed automatic ROI selection method yielded better results for the following reasons. First, the automatic method accelerated the selection procedure, making the on-line detection of soybean insects possible. Second, automatic ROI selection is more complete because it contains more texture information of the soybean, allowing it to adjust to different bean sizes and choose the appropriate ROI. Finally, the proposed automatic interest region selection method is based on iterative threshold segmentation that can choose the best threshold according to the gray value of each sample. It extracts the background first, then distinguishes the bean form soybean. When the grey value of the bean is similar to that of the background and the soybean (Figs. 4 and 5), a certain threshold image segmentation method cannot handle the process well; whereas the iterative threshold segmentation-based automatic selection method can obtain a satisfactory result. 4.2. Characteristic parameters of entropy, energy and mean Fig. 6 shows the profiles of the three parameter values (i.e., entropy, energy, and mean) extracted from the relative transmission images of a random 12 normal and 12 insect-damaged samples. The figure also shows that the characteristic curves greatly fluctuate at less than 500 nm because of the lower quantum efficiency of CCD, which resulted in a lower signal-to-noise ratio. At about 675 nm, the curve drastically fluctuates because it is the chlorophyll absorption point of the beans. After 700 nm, the characteristic values are significantly different in the near-infrared wavelength because of the stronger light reflection and light absorption of the internal chemical composition of the samples. A big difference is observed between the normal samples and insect-damaged samples because the textural features reflect the degree of distribution and texture coarseness of the gray image. For the insect-damaged bean, its internal chemical composition and texture changes indicate irregularity. However, the normal samples’ chemical composition and texture were relatively uniform. 4.3. Optimal wavelength selection using the FRSTCA Fig. 7 shows the scatterplot distribution for all sample parameters under optimal wavelengths selected by the FRSTCA. Based on the FRSTCA mentioned in Section 3.4, the selected optimal wavelengths were j1 = 705, j2 = 943 for entropy characteristics (the value of d is 0.05), and 692, 975, and 743 nm for both of energy and mean characteristics (the value of d is 0.4). The characteristic parameter distribution of the normal sample is concentrated, while the characteristic parameter distribution of insect-damaged samples are more scattered. This result may have been due to the texture distribution and the degree of texture thickness of the normal samples being relatively uniform. Given the location of the insect, different sizes of insects, and the different distribution of insect
Y. Ma et al. / Computers and Electronics in Agriculture 106 (2014) 102–110
107
Fig. 3. Flow diagram of hyperspectral images for insect-damaged vegetable soybean detection.
Fig. 4. Similar gray levels of the normal samples of soybean and the background: (a) relative transmission image at a wavelength of 750 nm; (b) certain threshold image segmentation; and (c) image segmentation of automatic threshold method proposed in this paper.
Fig. 5. Image segmentation of the insect-damaged sample: (a) relative transmission image at a wavelength of 750 nm; (b) certain threshold image segmentation; and (c) image segmentation of the automatic threshold method proposed in this paper.
excrements in soybeans, the texture distribution of the insectdamaged samples was relatively disordered. They have different degrees of thickness, which lead to a scattered distribution. All normal and insect-damaged samples were clearly gathered in different areas. SVDD computes a sphere-shaped decision boundary with minimal volume around a set of calibration objects. The test objects are accepted if their fractions are in the boundary, whereas
the insect-damaged samples are rejected. FRSTCA is shown to be a powerful wavelength selection method. It can be adopted as a tool to discover data dependencies and decrease the number of attributes contained in a data set. This data set utilizes data alone to determine optimal wavelength for constructing a hyperspectral transmittance system, which detects the presence of internal insect infestation in vegetable soybean.
108
(a)
Y. Ma et al. / Computers and Electronics in Agriculture 106 (2014) 102–110
(a)
16
1 0.8
insect-damaged normal
14
0.6
943nm
entropy
15
normal insect-damaged
0.4 0.2
13 400
500
600
700
800
900
0
1000
0
0.2
0.4
wavelength/nm
0.6
0.8
1
705nm
8
(b) 2.5 x 10
(b)
1
1.5
743nm
erergy
2
insect-damaged
normal insect-damaged
0.5
normal
1
0 1
0.5 0 400
0.5
975nm 500
600
700
800
900
0
0
0.6
0.4
0.2
0.8
692nm
1000
wavelength/nm
(c) 743nm
(c) 0.2
mean
0.15 insect-damaged 0.1
1
0.5
normal insect-damaged
0 1
normal
0.8
0.05
0.6
975nm 0 400
500
600
700
800
900
1000
wavelength/nm
Fig. 6. Profile of parameter values of 12 random normal and 12 insect-damaged samples: (a) profile of entropy; (b) profile of energy; and (c) profile of mean.
4.4. Classification results analysis Table 1 (Huang et al., 2013) and Table 2 show the classification results of the prediction set of four statistical characteristics and three characteristics proposed in this study utilizing automatic segmentation method with full wavelengths, respectively. Classification results for the calibration set were 100.0% (not shown). From Table 1, the model achieved 98.6% accuracy for the normal samples, 91.7% accuracy for the insect-damaged samples, and 97.6% overall classification accuracy for the prediction set. Compared to the previous manual method by Huang et al., the automatic extraction method has also received a very high precision, and overcomes the shortcomings of time-consuming and complex format conversion. From Table 2, the results indicate that the accuracy of the three features proposed in this study has little differences and nearly same as the four statistical characteristics for the overall classification accuracy but the number of features reduced to 25% of four statistical characteristics. Among them, the model with entropy feature achieved the best results, 100.0% accuracy for the normal samples, 91.7% accuracy for the
0.4
0
0.2
0.4
0.6
0.8
1
692nm
Fig. 7. Scatterplot distribution for all sample parameters under optimal wavelengths selected by a FRSTCA: (a) scatterplot distribution of entropy; (b) scatterplot distribution of energy; and (c) scatterplot distribution of mean.
insect-damaged samples, and 98.8% overall classification accuracy. The classification results utilizing three or less optimal wavelengths selected by FRSTCA are compared with the other wavelengths in Table 3. For the entropy feature, only two wavelengths are selected, with the overall classification accuracy reaching 98.8%. For the energy feature, overall classification accuracy reached 97.6%, and the insect-damaged samples’ classification accuracy reached 100.0%. Both accuracy results are higher than the accuracy of the full wavelengths, which has an important significance for food security and economic aspects in real life. In conclusion, these optimal wavelengths are small, highly stable, and more reasonable. These results suggest that a FRSTCA can decrease the redundancy and correlation between wavelengths effectively. Insect-damaged vegetable soybean detection also becomes feasible. The profile of the parameter values, scatterplot distribution, and classification results indicate that the feature of entropy has good results. Given the presence of noise and a state of uncertainty, information had been lost. Entropy is the most suitable scale to measure the degree of uncertainty and status richness (i.e., chaos and complexity). The smaller the probability of a message, the greater the amount of information obtained and a more useful diagnosis and evaluation of information is extracted.
Table 1 Classification results for the prediction set of samples utilizing four statistical characteristics under all wavelengths (Huang et al., 2013). Image feature
Max + Min + Mean + Std
Hand-peeling classification
Normal (70) Insect-damaged (12) Total (82)
Model classification
Classification accuracy (%)
Normal
Insect-damaged
69 1 70
1 11 12
98.6 91.7 97.6
109
Y. Ma et al. / Computers and Electronics in Agriculture 106 (2014) 102–110
Table 2 Classification results for the prediction set of samples utilizing three characteristics under all wavelengths. Image feature
Hand-peeling classification
Model classification
Classification accuracy (%)
Normal
Insect-damaged
Entropy
Normal (70) Insect-damaged (12) Total (82)
70 1 71
0 11 11
100.0 91.7 98.8
Energy
Normal (70) Insect-damaged (12) Total (82)
70 3 73
0 9 9
100.0 75.0 96.3
Mean
Normal (70) Insect-damaged (12) Total (82)
70 3 73
0 9 9
100.0 75.0 96.3
Table 3 Classification results for the prediction set of samples utilizing optimal wavelengths selected by FRSTCA. Image feature
Optimal wavelengths (nm)
Hand-peeling classification
Model classification
Classification accuracy (%)
Normal
Insect-damaged
Entropy
705,943
Normal (70) Insect-damaged (12) Total (82)
70 1 71
0 11 11
100.0 91.7 98.8
Energy
692,975,743
Normal (70) Insect-damaged (12) Total (82)
68 0 68
2 12 14
97.1 100.0 97.6
Mean
692,975,743
Normal (70) Insect-damaged (12) Total (82)
70 3 73
0 9 9
100.0 75.0 96.3
The life cycle of pod border consists of egg, larva, and moth, different stages of infestation had different classification accuracies (Kaliramesh et al., 2013). Due to the limitations of sample collection, there is not enough insect-damaged samples for a detailed analysis, and more insect-damaged soybeans will be searched for discussing on what damage level can be detected or classified by the proposed method at the best accuracy in the next work.
61271384 and 61275155), the Natural Science Foundation of Jiangsu Province (China, BK2011148), the Postdoctoral Science Foundation of China (Grant No. 2011M500851), the 111 Project (B12018) and PAPD of Jiangsu Higher Education, sponsored by Qing Lan Project.
5. Conclusion
Bhuvaneswari, K., Fields, P.G., White, N.D.G., Sarkar, A.K., Singh, C.B., Jayas, D.S., 2011. Image analysis for detecting insect fragments in semolina. J. Stored Prod. Res. 47 (1), 20–24. Dowell, F.E., Maghirang, E.B., Jayaraman, V., 2010. Measuring grain and insect characteristics using NIR laser array technology. Appl. Eng. Agric. 26 (1), 165– 169. Dowell, F.E., Pearson, T.C., Maghirang, E.B., Xie, F., Wicklow, D.T., 2002. Reflectance and transmittance spectroscopy applied to detecting fumonisin in single corn kernels infected with Fusarium verticillioides. Cereal Chem. 79 (2), 222–226. Hagstrum, D.W., Vick, K.W., Webb, J.C., 1990. Acoustical monitoring of Rhyzopertha dominica (Coleoptera: Bostrichidae) populations in stored wheat. J. Econ. Entomol. 83 (2), 625–628. Hou, J., Wang, C., Hong, X., Zhao, C., Xue, C., Guo, N., Gai, J., Xing, H., 2011. Association analysis of vegetable soybean quality traits with SSR markers. Plant Breed. 130 (4), 444–449. Hu, Q., Yu, D., Liu, J., Wu, C., 2008. Neighborhood rough set based heterogeneous feature subset selection. Inform. Sci. 178 (18), 3577–3594. Hu, Q., Yu, D., Xie, Z., Liu, J., 2006. Fuzzy probabilistic approximation spaces and their information measures. IEEE Trans. Fuzzy Syst. 14 (2), 191–201. Huang, M., Wan, X., Zhang, M., Zhu, Q., 2013. Detection of insect-damaged vegetable soybeans using hyperspectral transmittance image. J. Food Eng. 116 (1), 45–49. Huang, M., Lu, R., 2010. Apple mealiness detection using hyperspectral scattering technique. Postharvest Biol. Technol. 58 (3), 168–175. Kaliramesh, S., Chelladurai, V., Jayas, D.S., Alagusundaram, K., White, N.D.G., Fields, P.G., 2013. Detection of infestation by Callosobruchus maculatus in mung bean using near-infrared hyperspectral imaging. J. Stored Prod. Res. 52, 107–111. Krawczyk, B., Woz´niak, M., 2014. Diversity measures for one-class classifier ensembles. Neurocomputing. 126, 36–44. Liu, X., Yao, H., Zhang, Q., Dong, G., 2011. Status of pesticide and risk analysis of exporting vegetable soybeans in Zhe-jiang province. Soybean Sci. 30 (2), 298– 302. Lorente, D., Aleixos, N., Gómez-Sanchis, J., Cubero, S., García-Navarrete, O.L., Blasco, J., 2012. Recent advances and applications of hyperspectral imaging for fruit and vegetable quality assessment. Food Bioprocess Technol. 5 (4), 1121–1142.
An ROI selection approach based on the iterative threshold segmentation was proposed in this paper. Three statistical image features (i.e., entropy, energy, and mean) of the ROI in hyperspectral images were achieved. SVDD was adopted to develop the classification models for insect-damaged soybeans. Results show that an ROI method based on iterative threshold segmentation yielded good accuracy for detecting vegetable soybeans. These results indicate a great significance in safety monitoring and economic benefits of an on-line system based on multispectral transmittance imaging system. A method of a FRSTCA was employed to choose the optimal wavelength regions as a precursor in insect-damaged vegetable soybean detection. Three or less wavelengths were selected, while the overall classification results were same or higher than all wavelengths. The accuracy of insect-damaged samples is particularly better than a whole wavelength. These results suggest that the wavelengths selected by the FRSTCA procedure can decrease the redundancy and correlation between wavelengths effectively. This approach is also suitable for insect-damaged vegetable soybean detection. Acknowledgments The authors gratefully acknowledge the financial support from the National Natural Science Foundation of China (Grant Nos.
References
110
Y. Ma et al. / Computers and Electronics in Agriculture 106 (2014) 102–110
Lü, Q., Tang, M., 2012. Detection of hidden bruise on kiwi fruit using hyperspectral imaging and parallelepiped classification. Procedia Environ. Sci. 12 (Part B), 1172–1179. Mankin, R.W., 2004. Microwave radar detection of stored-product insects. J. Econ. Entomol. 97 (3), 1168–1173. Melvin, S., Kanunakaran, C., Jayas, D.S., White, N.D.G., 2003. Design and development of a grain kernel singulation device. Can. Biosyst. Eng. 45, 1–3. Mendoza, C.S., Acha, B., Serrano, C., Gómez-Cía, T., 2012. Fast parameter-free region growing segmentation with application to surgical planning. Mach Vision. APPL. 23 (1), 165–177. Miao, Y., Wang, Z., Liu, Q., 2013. Application of zernike-moment-based watershed segmentation on fruit features extraction. Trans. Chin. Soc. Agric. Eng. 29 (1), 158–163. Nakariyakul, S., Casasent, D.P., 2011. Classification of internally damaged almond nuts using hyperspectral imagery. J. Food Eng. 103 (1), 62–67. Okeyoowuor, J.B., Oloo, G.W., Agwaro, P.O., 1991. Bionomics of tetrastichussesamiae [hymenoptera, eulophidae], a pupal endo-parasitoid of marucatestulalis [Lepidoptera, pyralidae]. Entomophaga. 36 (3), 417–423. Pawlak, Z., 1991. Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer academic publishing. Pearson, T.C., Brabec, D.L., 2002. Automated detection of hidden internal insect infestations in wheat kernels using electrical conductance. ASAE meeting, Paper No. 023073, Chicago, USA, pp. 1–10.
Qin, J., Lu, R., 2008. Measurement of the optical properties of fruits and vegetables using spatially resolved hyperspectral diffuse reflectance imaging technique. Postharvest Biol. Technol. 49 (3), 355–365. Sakla, W., Chan, A., Ji, J., Sakla, A., 2011. An SVDD-based algorithm for target detection in hyperspectral imagery. IEEE Geosci. Remote Sens. Lett. 8 (2), 384–388. Shen, Q., Jensen, R., 2004. Selecting information features with fuzzy-rough sets and its application for complex systems monitoring. Pattern Recogn. 37 (7), 1351–1363. Singh, C.B., Jayas, D.S., Paliwal, J., White, N.D.G., 2009. Detection of insect-damaged wheat kernels using near-infrared hyperspectral imaging. J. Stored Prod. Res. 45 (3), 151–158. Tax, D.M.J., Duin, R.P.W., 1999. Support vector domain description. Pattern Recogn. Lett. 11 (20), 1191–1199. Tax, D.M.J., Duin, R.P.W., 2004. Support vector data description. Mach. Learn. 54 (1), 45–66. Tiwari, G., Slaughter, D.C., Cantwell, M., 2013. Nondestructive maturity determination in green tomatoes using a handheld visible and near infrared instrument. Postharvest boil. Technol. 86, 221–229. Williams, C.K.I., 2003. Learning with kernels: support vector machines, regularization, optimization and beyond. J. Am. Stat. Assoc. 98 (462), 489. Zayas, I.Y., Flinn, P.W., 1998. Detection of insects in bulk wheat samples with machine vision. J. Electron. Packag. 41 (3), 883–888. Zhao, Y., He, Y., Xu, X., 2012. A novel algorithm for damage recognition on pestinfested oilseed rape leaves. Comput. Electron. Agric. 89, 41–50.