Hyperspectral imaging technology combined with deep forest model to identify frost-damaged rice seeds

Hyperspectral imaging technology combined with deep forest model to identify frost-damaged rice seeds

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973 Contents lists available at ScienceDirect Spectrochimica Acta ...

2MB Sizes 0 Downloads 13 Views

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973

Contents lists available at ScienceDirect

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy journal homepage: www.elsevier.com/locate/saa

Hyperspectral imaging technology combined with deep forest model to identify frost-damaged rice seeds Liu Zhang a,b, Heng Sun a,b, Zhenhong Rao c, Haiyan Ji a,b,⁎ a b c

Key Laboratory of Modern Precision Agriculture System Integration Research, Ministry of Education, China Agricultural University, Beijing 100083, China Key Laboratory of Agricultural Information Acquisition Technology, Ministry of Agriculture, China Agricultural University, Beijing 100083, China College of Science, China Agricultural University, Beijing 100083, China

a r t i c l e

i n f o

Article history: Received 21 October 2019 Received in revised form 11 December 2019 Accepted 19 December 2019 Available online 23 December 2019 Keywords: Hyperspectral imaging technology Rice seed Frost damage Multivariate scatter correction Deep forest

a b s t r a c t In recent years, deep learning models have been widely used in the field of hyperspectral imaging. However, the training of deep learning models requires not only a large number of samples, but also the need to set too many hyper-parameters, which is time consuming and laborious for researchers. This study used hyperspectral imaging technology combined with a deep learning model suitable for small-scale sample data sets, deep forests (DF) model, to classify rice seeds with different degrees of frost damage. During the period, three spectral preprocessing methods (Savitzky-Golay first derivative (SG1), standard normal variate (SNV), and multivariate scatter correction (MSC)) were used to process the original spectral data, and three feature extraction algorithms (principal component analysis (PCA), successive projections algorithm (SPA), and neighborhood component analysis (NCA)) were used to extract the characteristic wavelengths. Moreover, DF model and three traditional machine learning models (decision tree (DT), k-nearest neighbor (KNN), and support vector machine (SVM)) were built based on different numbers of sample sets. After multivariate data analysis, it showed that the pretreatment effect of MSC was the most excellent, and the characteristic wavelength extracted by NCA algorithm was the most useful. In addition, the performance of DF model was better than these three traditional classifier models, and it still performed well in small-scale sample set data. Therefore, DF model was chosen as the best classification model. The results of this study show that the DF model maintains good classification performance in smallscale sample set data, and it has a good application prospect in hyperspectral imaging technology. © 2018 Elsevier B.V. All rights reserved.

1. Introduction Seed is the foundation of agriculture, and screening high vigor seeds with appropriate methods plays a crucial role in agricultural production [1]. In actual production, seeds are often threatened by various kinds of threats (such as heat damage, frost damage, fungal infection, mildew, mechanical damage, insect damage, etc.), which will lead to the decline of their vitality, affect their germination and emergence rate, and have a huge impact on agriculture [2,3]. Rice is widely cultivated around the world and is one of the main food crops. About half of the world's people feed on rice [4]. In China, rice can be generally divided into indica rice and japonica rice according to the growth cycle. Indica rice is widely planted in tropical and subtropical regions and can mature several times a year, while japonica rice is planted in temperate zone and cold zone and generally only ripens once a year. In Northeastern China, late-ripening japonica rice is mainly ⁎ Corresponding author at: Key Laboratory of Modern Precision Agriculture System Integration Research, Ministry of Education, China Agricultural University, Beijing 100083, China. E-mail address: [email protected] (H. Ji).

https://doi.org/10.1016/j.saa.2019.117973 1386-1425/© 2018 Elsevier B.V. All rights reserved.

planted and harvested from mid-October to mid-November. Due to the low temperature during the period in Northeastern China, and the high moisture content of these rice seeds before harvest, it is often damaged by frost. In addition, rice after the new harvest will affect the subsequent storage due to the high internal moisture content, so people often use appropriate processes to dry it to reduce the moisture content. In the northeastern region of China (such as Heilongjiang Province), the drying time of grain is generally selected in the winter. In this period of time because the temperature is very low, it is easy to form frost damage to the seeds with high moisture content. Due to the reduced vitality, these damaged seeds will have low germination rate and slow emergence when they are put into field planting in the future, which will lead to production cuts and agricultural disasters [5,6]. Therefore, the use of suitable methods to quickly and non-destructively detect damaged seeds (especially slightly damaged seeds) plays a crucial role in agricultural production. Traditionally, people often use germination test, tetrazolium test, biochemical test, etc. to detect seed vigor. Although the accuracy is high, these methods are highly destructive, complicated in operation, high in cost, and time consuming, which is not conducive to widespread promotion [7]. In recent years, with the development of non-destructive

2

L. Zhang et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973

testing technology, many researchers have successfully utilized nearinfrared (NIR) spectroscopy technology, machine vision technology to detect the type, age, hardness, damage degree and other characteristics of seeds [8–10]. Although NIR spectroscopy and machine vision technology have many advantages, NIR spectroscopy can only obtain the spectral information of the sample and cannot obtain the image information of the sample [11,12]. Machine vision often evaluates image features such as color, shape, size, and surface texture characteristics of the sample, and it is not suitable for analysis of the chemical composition of the sample [13,14]. At present, the characteristics of detecting single seed based on hyperspectral imaging technology have been favored by people. Some researchers have used hyperspectral imaging technology to detect seed vigor, aging degree, frostbite and other characteristics, and achieved good results. Because hyperspectral imaging technology can simultaneously obtain hyperspectral cube data of samples, and can analyze the spectral information and image information of each sample separately, it overcomes the limitations of traditional spectroscopy technology and machine vision technology [15,16]. Nowadays, with the development of chemometrics, machine learning and deep learning, there has been a major breakthrough in hyperspectral data mining. Hyperspectral imaging technology has become a powerful, fast and non-destructive testing technology widely used in food, agriculture and other fields. Zhang et al. [17] used hyperspectral imaging technology combined with different machine learning algorithms to establish discriminant models, and finally realized the discrimination of frostdamaged corn seeds. Nie et al. [18] used hyperspectral imaging technology combined with traditional machine learning models and deep learning models to classify different types of hybrid seeds. The results showed that the deep convolutional neural network (DCNN) model performed best with an accuracy of 95% and was selected as the best model. Many researchers have confirmed that the classification accuracy, robustness and other performance of models built by deep learning algorithms are superior to traditional machine learning models in some cases [19–21]. However, compared to the traditional classifier model, a large number of training samples are required to construct the deep model, as well as hyper-parameter settings [22]. Although the hyperspectral cube data of a large number of samples can be obtained by the hyperspectral spectral imaging technology, in the actual situation, the experimenters spend too much time in the process of collecting a large number of samples and marking the sample labels [23]. Therefore, we urgently need a model that can be applied to small-scale data, and the performance of the model, such as accuracy and robustness, are very good. A new deep learning framework proposed by Zhou and Ji, the socalled deep forest (DF), was regarded as one of the important events of machine learning in 2017 [24]. The DF combines several ensemblebased methods, including random forests (RFs) and Stacking, into a structure which is similar to a multi-layer neural network, but each layer in the DF contains RFs instead of neurons. All advantages of DF are clearly discussed in [25,26]. In particular, the DF model is not only suitable for large-scale sample data but also for small-scale sample data. Because of the small number of parameters, DF is easy to train, it does not use backpropagation training, and in the case of only smallscale training data, it outperforms many well-known methods including deep neural networks [24]. Therefore, this study used hyperspectral imaging technology combined with DF model to classify rice seeds with different degrees of frost damage. The specific objectives are as follows: (1) using the standard seed germination test prescribed by the International Seed Testing Association (ISTA) to verify the frost damage of rice seeds; (2) selecting the best spectral pretreatment method and feature extraction algorithm by using multivariate data analysis; (3) comparing the DF model with the traditional machine learning models and analyzing their respective performance capabilities in the small-scale sample data set; (4) based on the

best model of calibration, the classification results are visualized to provide a more intuitive prediction effect.

2. Materials and methods 2.1. Sample and sample preparation The rice seeds selected for this experiment were purchased online (https://m.tb.cn/h.ejhdiQi). The variety, named “Yanfeng”, was harvested in 2018 in Panjin City, Liaoning Province, China, with an initial moisture content of 13%–14% (dry seeds). From this batch of seeds, 1800 rice seeds were randomly selected for subsequent experiments. Because this experiment mainly studies the degree of frost damage of rice seeds just after harvest (i.e., freshly harvested rice seeds have a moisture content of about 30%) during the low temperature environment. There are no major differences between dry and newly harvested rice seeds except for the difference in moisture content. Therefore, these seeds were artificially treated to reach 30% moisture content according to the method of Lohumi et al. [27]. 1800 seeds were randomly divided into 6 groups (one of which was used as a control group and was not subjected to freezing treatment), and each group contained 300 rice seeds. The remaining 5 groups of rice seeds were stored at different freezing temperatures for different times. The freezing conditions are shown in Table 1. After the freezing treatment, the seeds were placed in a dry and ventilated environment at room temperature of 25 °C for 1 week, in order to return them to normal temperature and eliminate the interference of the frozen seeds in reabsorbing water at room temperature.

2.2. Hyperspectral image acquisition and correction In this experiment, the “GaiaSorter” hyperspectral imaging system produced by Zolix Co., Ltd. (Beijing, China) was selected. The core components of the system consist of uniform light source, spectral camera, computer and associated control software. The camera used in the spectral imager is Zolix's “image-λ” series hyperspectral camera with a spectral range of approximately 866.4–1701.0 nm. The working principle of the system is to place the samples to be tested on the electric mobile platform controlled by software, and adopt the push-broom method [5] to collect images. Along with the movement of the electric platform, the hyperspectral cube data containing the spectral information and image information of the samples to be tested are finally obtained. Before the data collection, the power supply was turned on to warm up the hyperspectral imaging system for 30 min to eliminate the baseline drift and other errors caused by the system. Then, the system's own SpecView software (SpecView Ltd., Uckfield, UK) was run to conduct a series of tests such as focusing, and finally, the exposure time was determined to be 0.09 s and the moving speed of the electric mobile platform was 0.65 cm/s. To facilitate the segmentation of rice seeds from the background, we placed these rice seeds on a black plate with very low reflectivity. In order to obtain a perfect image, we corrected the original hyperspectral image to eliminate the influence of dark current and other noises on the image [28]. The specific operation steps are as follows: in the same acquisition environment, the image captured by the camera at the whiteboard is IW, then the lamp is turned off and the camera lens is covered with the lens cover, and the black image is

Table 1 The freezing condition (freezing temperature and freezing duration) of rice seeds. Freezing condition Temperature (°C) Duration (h)

Room temperature /

−10 4

−15 8

−20 12

−25 16

−25 20

L. Zhang et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973

IB. The correction formula is: IC ¼

I O −I B IW −IB

ð1Þ

In Formula (1), IC is the corrected hyperspectral image, IO is the original hyperspectral image, IB is the all-black calibration image, and IW is the all-white calibration image. The software tool for correcting the images is SpecView. 2.3. Standard germination test After the acquisition of hyperspectral images, 50 rice seeds were randomly selected from each group and tested for germination according to the rules of the International Seed Testing Association (ISTA) [29]. We soaked the seeds in distilled water for 12 h, then the seed germination test was carried out in a standard germination box and covered with moist germinated paper on the surface of the seeds for shading at room temperature of 25 °C. Germination force (GF) and germination rate (GR) are one of the main indicators reflecting the quality of seeds. The calculation formulas are Formulas (2) and (3). In general, seeds with high GR and GF have high vigor, while seeds with high GR and low GF are also likely to have low vigor. Therefore, the GF, GR and average shoot length of rice seeds were calculated to reflect the degree of frost damage of rice seeds. The specific steps are as follows: (1) after 4 days, the GF of all rice seeds in each group is counted, wherein the root length is at least up to the granule length, and the shoot length is at least 1/2 granule long as the normal germinated seed; (2) after 9 days, the shoot length of all rice seeds in each group is counted, and it is determined that the shoot length is at least 1 cm to be the normal germinated seed, and the germination rate is calculated; (3) finally, calculate the average shoot length of each group of rice seeds. GF ¼

M1  100% M

ð2Þ

GR ¼

M2  100% M

ð3Þ

where, GF represents the germination force, GR represents the germination rate, M1 represents the number of seeds that are normally germinated within the germination days, M2 represents the number of all normal germinated seeds, and M represents the number of seeds to be tested. 2.4. Data analysis 2.4.1. Spectral data extraction and preprocessing Since the data collected by the hyperspectral system is hyperspectral cube data, it is necessary to perform a series of processing on the original images to finally extract the spectral data. As shown in Fig. 1: (1) using the Sobel operator to perform edge extraction on the seeds in the corrected hyperspectral image, corrosion and expansion; (2) binary processing generates a mask, and the original image was segmented to remove the background; (3) extracting the entire area of each rice seed on the image as the region of interest (ROI), and then calculating the average reflectance of all the pixels of ROI as the spectral value of each rice seed. In the above steps, image processing was performed on Matlab R2018b (The MathWorks, Natick, MA, USA), and the spectral value of all pixels in the ROI were calculated on the ENVI 5.3 (ITT Visual Information Solutions, Boulder, UT, USA). Since the presence of noise in the raw spectral data will interfere with subsequent data analysis, the raw spectral data needs to be preprocessed using appropriate methods. In this experiment, the raw spectra data were preprocessed with SG1, SNV and MSC, respectively.

3

2.4.2. Visualization of high-dimensional spectral data Due to the high dimensionality and redundant information in the original spectral data, it is difficult for us to intuitively observe the differences between samples from the high-dimensional spectral data. Therefore, it is very meaningful to use the appropriate method to reduce the dimensionality of the high-dimensional spectral data and visualize the differences between them. In this study, the high-dimensional spectral data was processed by the t-distributed stochastic neighbor embedding (t-SNE) method, and the differences between different groups of samples were visualized. t-SNE is an algorithm derived from SNE. It maps high-dimensional to low-dimensional and tries to ensure that the distribution probability between them is constant. SNE regards the data distribution in both high-dimensional and low-dimensional as Gaussian distribution, while t-SNE treats the data in low-dimensional as t-distribution. The advantage of this is to enlarge the distance between clusters with large distances, thus solving the crowding problem [30]. In addition, t-SNE is a nonlinear dimensionality reduction algorithm that preserves the partial structure of the original data, and is very suitable for high-dimensional data dimensionality reduction to 2D or 3D for visualization. 2.4.3. Characteristic wavelength extraction Since the original hyperspectral data contains redundant and collinear information, the interference information will affect the operation speed and robustness of the model. Therefore, it is crucial to extract the characteristic wavelengths by appropriate methods for spectral analysis, moreover, these characteristic wavelengths have reference significance for the future development of multi-spectral online detection system. In this study, PCA, SPA, and NCA were used to extract the characteristic wavelengths, respectively. PCA usually converts multiple indicators in high-dimensional data into a few comprehensive indicators (i.e., principal components), each of which can reflect the main information of the original data to achieve the purpose of dimensionality reduction. It is widely applied in dimensionality reduction analysis of spectral data to achieve the purpose of removing redundant information [31]. SPA is a forward variable selection method, which can remove collinear information and redundant information to the greatest extent [32]. Firstly, the SPA selects the smallest collinear variable and the least redundant variable as well as the maximum projection vector. Secondly, it determines the characteristic variables according to the minimum root mean square error of validation (RMSEV) in the validation set of the multiple linear regression (MLR) calibration. Finally, the selected characteristic variables are arranged according to the size of the correlation. NCA is a metric learning algorithm. Its core idea is to find a suitable space, and the definition of a suitable space is the distance metric. Therefore, a metric expression formula is needed to facilitate the learning of the distance metric, so as to extract features to achieve dimensionality reduction. Based on KNN with Mahalanobis distance as distance measurement, NCA learns the transformation matrix by constantly optimizing the accuracy of KNN classification, and finally obtains the transformation matrix that reduces the dimension of the original data [33]. 2.5. Discriminant model 2.5.1. Decision tree DT is a common classification method that constructs a decision tree according to the probability of occurrence in different situations and determines the probability that the expected value of net present value is greater than or equal to zero. The DT in machine learning represents the mapping relationship between object attributes and object values [34]. Each node in the tree represents an object, each bifurcation path represents a possible attribute value, and each leaf node represents the value of the object represented by the path from root node to leaf node. Since

4

L. Zhang et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973

Fig. 1. Main flow chart for extracting spectral data.

Fig. 2. Structure of the deep forest (DF).

L. Zhang et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973

decision tree is a graphical method using probability analysis, it is very intuitive. This study used cross validation to determine the most appropriate minimum leaflet (minleaf) value. 2.5.2. K-nearest neighbor The KNN algorithm is both a theoretically mature algorithm and one of the simplest algorithms in machine learning. The core idea of the KNN algorithm is that if the majority of the k most neighboring samples in a feature space belong to a certain category, the sample also belongs to this category [35]. This study used an automatic optimization parameter program to determine the k value of the model. 2.5.3. Support vector machine SVM is based on structural risk minimization, find a segmentation hyperplane between the data, maximize the spacing, and achieve data classification. SVM generally has linear kernel function, polynomial kernel function, and radial basis function (RBF) kernel function. Different kernel functions have different functions [36]. This study used the RBF kernel function, which can effectively process linear and nonlinear data. In addition, it is easy to select parameters and use the grid search program to determine the penalty coefficient (c) and the kernel range (g).

5

Table 2 Germination force (GF), germination rate (GR) and average shoot length of rice seeds under different freezing conditions. Freezing condition

GF (%)

GR (%)

Average shoot length (cm)

Untreated −10 °C/4 h −15 °C/8 h −20 °C/12 h −25 °C/16 h −25 °C/20 h

96 82 66 44 10 0

96 90 82 64 24 8

4.92 4.24 3.02 2.43 1.01 0.45

performance gain, the training process will be terminated. Therefore, the complexity of the DF model can be determined by appropriate termination, which makes the DF still can be used even in the face of small data sets compared with the deep neural network model, because its structure does not rely on the generation of large amounts of data [24]. The DF model used in this study was derived from the official website of the Institute of Machine Learning and Data Mining, Nanjing University and was implemented based on Spyder 3.3.2 (Anaconda, Austin, TX, USA).

3. Results and discussion 2.5.4. Deep forest DF is a deep learning algorithm that is a new tree-based model comparable to deep neural networks. It uses the structure of the cascaded forest for representational learning. Compared to the neural network model, it can be applied to small-scale sample data, and the parameters are easy to select. The DF consists mainly of two modules: a multigranularity scanning module and a cascaded forest module [25]. Its structure is shown in Fig. 2. Take this figure as an example, three categories: (1) Three windows are used. Firstly, the output of each forest is a three-digit vector through the multi-granularity scanning module. The results of scanning windows of different scales are cascaded together to get a 3618-dimensional vector, and then input into the cascade forest module. (2) In cascade forests, each layer is composed of several random forests. The feature information of input feature vector is learned through random forest and input to the next layer after processing. In order to enhance the generalization ability of the model, multiple different types of random forests are selected for each layer. The Fig. 2 gives two types of random forest structures, namely, completely random tree forests (blue) and random forests (black), two of each type. Where, each of the completelyrandom tree forests contains 500 trees, and each node randomly selects a feature as a discriminating condition, and generates a child node according to the discriminating condition until each leaf node only contains an instance of the same class and stops; The random forests also contain 500 trees, and the node features pffiffiffi are selected by randomly selecting d features (d is the number of input features), and then selecting the largest feature of the Gini coefficient as the condition for the node division. For example, four forests (two completely random forests and two random forests) output a 12-dimensional vector, and then cascade with the input 3618-dimensional vector to obtain a 3630dimensional vector, which is input to the next stage. Finally, the 12-dimensional vector is output, averaged, and the maximum output is taken.

In addition, in a cascade structure, a performance test is performed first at the end of one level, and then the next level is generated. When a new level is extended, the performance of the entire cascade is estimated on the verification set, and if there is no significant

3.1. Analysis of germination test results Table 2 shows the germination potential (GF), germination rate (GR) and average shoot length of rice seeds under different freezing conditions. It can be seen that: (1) with the intensification of freezing conditions, the GF, GR and average shoot length of rice seeds all decreased. (2) The GF and the GR of the normal rice seeds are consistent, while the GF of the rice seeds subjected to frost damage is lower than the GR of normal rice seeds. The GR of crop seeds is high and the GF is strong, indicating that the emergence of seedlings is fast and tidy, and the seedlings are strong; if GR is high and the GF is weak, it indicates that the seedlings are uneven and the seedlings are weak. Specifically, although the rice seeds at −10 °C/4 h have a very good GR (90%), the GF is 82%, and the GF is inconsistent with the GR. In addition, the average shoot length of these seeds (−10 °C/4 h) was lower than that of normal seeds, indicating that these seeds were slightly frost-damaged. These frost-damaged seeds will not produce enough seedlings when they are put into fields in the future, which will affect the harvest. Therefore, the rapid and non-destructive identification of these rice seeds

Fig. 3. Average spectral curves with their standard deviation of rice seeds under different freezing conditions.

6

L. Zhang et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973

with different degrees of frost damage is essential for agricultural production. 3.2. Original spectral analysis The original spectral wavelength range was 866.4–1701.0 nm, and there were significant noises in the front and the rear part of the original spectral curve due to other factors such as optical instruments. Therefore, we discarded the wavelengths of the noise-containing part and selected wavelengths in the range of 949.0–1638.0 nm, which contained a total of 210 wavelengths for analysis. Fig. 3 shows the average spectral curves and standard deviations of rice seeds under different freezing conditions. It can be seen that the average spectral curve trends of the six groups of rice seeds are very similar and difficult to distinguish. However, in some specific wavelength ranges, the spectral

characteristics of these rice seeds are significantly different. For example, in the wavelength range of 1000.0–1300.0 nm, the order of spectral reflectance from high to low is: −25 °C/20 h N −20 °C/16 h N −15 °C/ 12 h N −10 °C/8 h N −10 °C/4 h N Untreated. Among them, this difference is most obvious near 1300 nm. At 1000–1100 nm, it mainly corresponds to the third overtone of N\\H stretching, and 1100–1300 nm mainly corresponds to the second overtone of C\\H stretching [37]. As the freezing temperature and time increase, it will cause mechanical damage to the cells of the seed, destroy the structure of the starch, and also change the structure of the aleurone layer and the embryo inside the seed, thereby blocking the entry of gibberellin into the aleurone layer. The physiological pathway, therefore, the starch hydrolyzed in the aleurone layer cannot enter the endosperm, thereby affecting the vigor of the seed [38]. Therefore, as the freezing conditions are intensified, the structure of seed cells and other structures are more markedly

Fig. 4. Spectral curves of different preprocessing methods: (a) raw spectral curves; (b) spectral curves after SG1 processing; (c) spectral curves after SNV processing; (d) spectral curves MSC processing. Visualize spectral data processed by different preprocessing methods using t-SNE: (e) raw spectral data; (f) spectral data after SG1 processing; (g) spectral data after SNV processing; (h) spectral data after MSC processing.

L. Zhang et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973

7

Fig. 4 (continued).

destroyed, so that the spectral reflectance in the range of 1000–1300 nm increases as the freezing conditions intensify.

3.3. Visualization analysis of high-dimensional spectral data In this study, t-SNE was used to visualize raw spectral data and spectral data processed by three pretreatment methods (SG1, SNV, MSC), and these high-dimensional spectral data were reduced to twodimensional plane for analysis and comparison. Due to the strong randomness of t-SNE, this study used the default t-SNE parameter settings in Matlab R2018b. (Distance metric: Euclidean distance; Effective number of local neighbors of each point: ‘Perplexity’ = 30; Learning rate for optimization process: ‘LearnRate’ = 500; Barnes-Hut tradeoff parameter: ‘Theta’ = 0.5).

Fig. 4 shows the spectral curves of the different pretreatment methods and the corresponding t-SNE visualization results. Fig. 4e–h shows the results of visualizing raw spectral data and spectral data processed by three preprocessing methods (SG1, SNV, MSC) using t-SNE. It can be seen from Fig. 4e that the raw spectral data of rice seeds under different freezing conditions are mixed and overlapped, and it is difficult to observe the characteristics of the data after the dimension reduction. Similarly, it can be seen from Fig. 4f and g that the spectral data processed by SG1 and SNV still overlap a lot, and there appears to be no improvement compared to the raw spectral data. Fig. 4h shows that the spectral data after MSC pretreatment shows a very good clustering effect, and it can be clearly seen that the 6 groups of rice seeds are well classified. In conclusion, the performance of spectral data processed by MSC is obviously better than that of raw spectral data and spectral data

8

L. Zhang et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973

the spectral data processed by MSC shows excellent classification performance after visualization by t-SNE, this study will still combine with other machine learning algorithms for further modeling analysis and provide more references for the future development of multispectral online detection system. 3.4. Modeling analysis based on full wavelengths

Fig. 5. Results based on full wavelengths modeling analysis.

processed by SG1 and SNV. This may be due to differences in the sample itself and other factors, such as the scattering of light caused by optical instruments to produce noise. MSC can largely eliminate the influence of light scattering, spectral baseline drift and other phenomena [39]. Therefore, compared with the other two spectral pretreatment methods, MSC has better performance in this experiment. Although

All samples were randomly divided into calibration sets and prediction sets in a 3:1 ratio before modeling. In order to select the best combination of preprocessing method and model, the raw spectral data and the spectral data preprocessed by SG1, SNV and MSC were input into DT, KNN, SVM and DF models respectively. Fig. 5 shows the results of modeling analysis based on full wavelengths. It can be seen that the spectral data after MSC processing has the highest modeling accuracy. The accuracy of the calibration set and prediction set of the MSC-DT, MSC-KNN, MSC-SVM, and MSC-DF models are both higher than 90%, which is obviously better than the modeling results of other preprocessing methods. This is the same as the conclusion of t-SNE visualization (i.e., the spectral data after MSC preprocessing has the best performance compared with the other two preprocessing methods). Since modeling analysis based on full wavelengths will reduce the operational speed and robustness of the model, it is necessary to conduct feature extraction on full wavelengths to remove redundant information to obtain the most useful wavelength. Finally, we selected the spectral data processed by MSC to conduct characteristic wavelength extraction for further modeling analysis.

Fig. 6. The characteristic wavelengths extracted using the first three principal component load curves.

L. Zhang et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973

Fig. 7. Characteristic wavelengths extracted by SPA.

3.5. Characteristic wavelength selection based on PCA, SPA and NCA The purpose of characteristic wavelength selection is to reduce the original high-dimensional spectral data dimensions, maximizing the retention of useful information and removing redundant information. In this study, PCA, SPA, and NCA were used to extract characteristic wavelengths from MSC-processed spectral data (210 variables). Since the cumulative contribution rate of the first three principal components reached 99.52% (PC1 = 86.22%, PC2 = 12.17%, PC3 = 1.13%), the load coefficients of the first three principal components were selected for characteristic wavelength extraction. Fig. 6 shows the characteristic wavelengths extracted using the first three principal component load curves. The peaks and valleys are generally considered to be characteristic wavelengths when analyzing the load curve. Therefore, 10 wavelengths were extracted from the load curve: 1003.7, 1108.7, 1115.4, 1192.5, 1199.2, 1295.4, 1302.0, 1357.8, 1462.0, and 1471.7 nm as important wavelengths for distinguishing rice seeds of different degrees of frost damage. Fig. 7 shows the results of the characteristic wavelengths selected by the SPA. Finally, 8 characteristic wavelengths were selected, which were 1139.0, 1088.5, 1000.3, 1195.9, 1282.2, 1612.6, 1367.6 and 1467.0 nm in order of their correlations. The correlation of these wavelengths also showed their importance in distinguishing rice seeds with different degrees of frost damage.

The NCA algorithm is often used to select feature variables in highdimensional data. It can calculate the weight of each variable in the operation process, assign the weight of the uncorrelated or low correlation variable to 0 or very close to 0, and then select the variable with higher weight as the eigenvalue. Fig. 8 shows that only 6 of the 210 wavelengths have weight values significantly above 0, while other wavelengths have a weight value of 0 or very close to zero. It can be seen that the contribution of most wavelengths to distinguishing different degrees of frost-damaged seeds is very small, so those wavelengths whose weight values are significantly higher than 0 are finally selected as the characteristic wavelength. Finally, a total of six characteristic wavelengths were selected, which were ranked as high-to-low weight values of 1030.9, 1529.6, 1334.9, 1152.4, 1047.9, and 1413.3 nm. These wavelengths showed an important relationship with the chemical composition of rice seeds according to the weight value. Table 3 shows the characteristic wavelengths extracted by the three feature extraction algorithms. It can be seen that the characteristic wavelengths extracted by PCA and SPA are very close, mainly distributed around 1000, 1100, 1200, 1300, 1350, and 1450 nm. The 1000.3 and 1003.7 nm mainly correspond to the third overtone of N\\H stretching, which is related to protein. 1088.5, 1108.7, 1115.4, and 1139.0 nm mainly correspond to the second overtone of C\\H stretching; 1192.5, 1195.9, 1199.2, 1282.2, 1295.4, 1302.0, 1357.8, and 1367.6 nm mainly correspond to the second overtone of C\\H combination stretching, it related to starchy. The 1462.0 and 1471.7 nm mainly correspond to the first overtone of O\\H stretching, it related starch and lipid. The 1612.6 nm correspond to second overtone of N\\H stretching [37]. The number of characteristic wavelengths extracted by the NCA algorithm is the least, 1030.9 and 1047.9 nm mainly correspond to the third overtone of N\\H stretching, and related to proteins. The 1152.4 and 1334.9 nm mainly correspond to the second overtone of C\\H combination stretching, which is related to starch. The 1413.3 and 1529.6 nm mainly correspond to the first overtone of O\\H stretching, which is related to starch and lipids [37]. In order to further confirm which method extracts the feature wavelengths most effectively, it is necessary to further modeling analysis based on the characteristic wavelengths, and select the most reasonable feature selection algorithm. 3.6. Modeling analysis based on characteristic wavelengths To assess the effectiveness of the different models, we divided the total sample set (i.e., 6 categories of rice seeds, each containing 300 kernels, a total of 1800 kernels) into several sample sets containing different seed numbers. These sample sets consist of each category of rice seeds 10, 20, 30, 40, 50, 100, 150, 200, 250 and 300 kernels, respectively. The accuracy of the model was obtained by a 5-fold cross-validation. Fig. 9a–d shows the results of DT, KNN, SVM and DF models based on each feature algorithm under different number of sample sets. On the whole, the modeling results of the PCA are not as good as the modeling results of the NCA and the SPA. After careful comparison of the modeling results of each model, it is found that the modeling effect of the characteristic wavelengths extracted by the NCA algorithm is significantly better than SPA when the number of sample sets is small, but as the number of sample sets increases, the NCA and SPA effects are close or even equal. In addition, the number of characteristic wavelengths

Table 3 Characteristic wavelengths extracted by PCA, SPA, and NCA. Methods No. Characteristic wavelengths (nm)

Fig. 8. Weight values for each wavelength obtained using the NCA.

9

PCA

10

SPA NCA

8 6

1003.7, 1108.7, 1115.4, 1192.5, 1199.2, 1295.4, 1302.0, 1357.8, 1462.0, 1471.7 1000.3, 1088.5, 1139.0, 1195.9, 1282.2, 1367.6, 1462.0, 1612.6 1030.9, 1047.9, 1152.4, 1334.9, 1413.3, 1529.6

10

L. Zhang et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973

Fig. 9. Modeling results based on each feature extraction algorithm under different numbers of sample sets. (a) DT model; (b) KNN model; (c) SVM model; (d) DF model.

extracted by the NCA is smaller than the characteristic wavelengths extracted by the SPA, which is very useful for improving the operation speed of the model. Therefore, NCA was selected as the best feature extraction algorithm. Next, we further evaluate the performance of each model. Fig. 10 shows the results of modeling the DT, KNN, SVM, and DF models based on the NCA under different numbers of sample sets. It can be seen that: (1) with the increase of the number of sample sets, the accuracy of each model increases, but when the sample number of each category of rice seed reaches 200, the accuracy of each model increases very slowly and tends to be stable. Among them, DT model has the worst effect and DF model has higher accuracy than other

three models. (2) When the number of samples of each category of rice seed in the sample set reaches 10, the accuracy of the DF model is close to 80%, while the accuracy of the other three models was 30–60%. When the number of samples of each category of rice seeds in the sample set reaches 20, the accuracy of the DF model was very close to 90%, while the accuracy of the other three models is lower than 72%. (3) When the number of samples of each category of rice seed in the sample set reaches 50, the accuracy of the KNN and SVM models is slightly higher than 90%, and with the increase of the number of sample sets, the accuracy of the final DF and KNN, SVM models are very close. The above results showed that the DF model still maintained a good classification efficiency when the number of samples was small

L. Zhang et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973

11

cases [20,21]. However, it cannot be ignored that traditional deep models require a large number of samples as training and calibration models, and the models need to set more hyper-parameters, which are prone to over-fitting [40]. Although the bulk detection characteristics of hyperspectral imaging technology make large-scale data collection possible, it is still time-consuming and laborious for researchers to collect a large number of samples and label the samples. Therefore, it is urgent to adopt a classification model suitable for small sample data and with excellent effect. The DF, as a new deep learning model, can also achieve good results in small sample data, and requires less parameter settings and is easy to train, so it is a good choice.

3.7. Visualization of rice seeds with different frost damage degrees

Fig. 10. The results of modeling the DT, KNN, SVM, and DF models based on the NCA under different numbers of sample sets.

Table 4 Results of DF model for the separation of healthy rice seeds from rice seeds with different degrees of frost damage (Note: “Sen.” means “sensitivity” and “Spe.” means “specificity”). Freezing condition

−10 −15 −20 −25 −25

°C/4 h °C/8 h °C/12 h °C/16 h °C/20 h

Calibration

Prediction

Sen. (%)

Spe. (%)

Sen. (%)

Spe. (%)

Sen. (%)

Cross-validation Spe. (%)

100 100 100 100 100

100 100 100 100 100

98.51 100 100 100 100

100 100 100 100 100

99.61 99.53 100 100 100

100 100 100 100 100

and was significantly higher than the three traditional machine learning classification models used in the paper. In addition, since the classification accuracy of the DF model was superior to the other three classification models in different numbers of sample sets, it was finally selected as the best classification model. Some researchers have shown that traditional deep learning models require a large number of samples for training [21–23], so we did not use the traditional deep learning models for comparison, but the conclusion is still clear. There is no doubt that machine learning algorithm plays a crucial role in hyperspectral data mining, and deep learning is a hot topic in the field of artificial intelligence. Some researchers have applied deep learning models to the field of hyperspectral data mining and have proven to be superior to traditional machine learning models in some

Hyperspectral imaging technology has the characteristics of obtaining the spectral information and spatial information of rice seeds at the same time, which makes it possible to display the rice seeds classification results with different frost damage degrees by using visualization maps. Many researchers have adopted pixel-wise and object-wise methods for visual classification of seed categories. The pixel-wise method predicts the categories of all pixels in a kernel based on the spectral data of the single-point pixel, while the objectwise method is to use the average spectral data of all pixels as the spectral data for each kernel. Baek et al. [7] believed that seed vitality was a concept related to the whole seed, so it could not be considered that each pixel in the seed image represented the viability state of the seed. Therefore, this experiment used the object-wise method for visualization. The general steps of image visualization were as follows: (1) samples were separated from the background and the sample area of each rice seed was defined as an ROI. (2) Spectral data were extracted from the predetermined ROI and the average spectra of each ROI were calculated and preprocessed. (3) Calibration model was developed on characteristic wavelengths with the optimal number of samples in the calibration set. (4) The category value of each ROI was predicted by using the corresponding average spectra and the established calibration model. (5) By describing different types of rice seeds with specific colours, a classification map was formed. In this study, 1500 seeds (250 kernels per category) were selected from all samples for calibration and test model, and the remaining 300 seeds (50 kernels per category) were used for visualization, and object-wise method was used for visualization based on MSC-NCA-DF model. When calibrating the model, we randomly divided the 1500 rice seeds into a calibrated set and a prediction set in a 3:1 ratio, and the model was validated with 5-fold cross-validation. We introduced sensitivity and specificity to assess the effects of the model. The definitions and expressions for sensitivity and specificity are detailed in [41].

Fig. 11. Visualization of classification results of rice seeds with different degrees of frost damage.

12

L. Zhang et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973

Table 4 shows the results of DF model for the separation of healthy rice seeds from rice seeds with different degrees of frost damage. It can be seen that the DF model has high sensitivity and specificity for identifying healthy rice seeds in rice seeds with different degrees of frost damage, indicating that the DF model can easily distinguish healthy rice seeds from frost-damaged rice seeds. Fig. 11 shows the visual classification result. It can be seen that only two rice seeds among the 300 seeds were misclassified, and the overall classification effect reached 99.33% (298/300), but healthy rice seeds and frost-damaged rice seeds were all correctly identified. The visualization map provides us with an intuitive and accurate estimation of rice seeds with different degrees of frost damage, which is conducive to the rapid detection and timely removal of rice seeds with low vigor, thus ensuring the purity of rice seeds with high vigor in the production process. In addition, the visualization results showed that compared with traditional methods, hyperspectral imaging technology provided a simple and intuitive method for rapid and nondestructive detection of rice seeds with different degree of frost damage, and required less time and manpower. 4. Conclusion In this study, hyperspectral imaging technology combined with DF model was used to identify rice seeds with different frost damage degrees, and excellent recognition results were obtained. During the period, three spectral preprocessing methods (SG1, SNV, and MSC), three feature extraction algorithms (PCA, SPA, and NCA), three traditional machine learning classification models (DT, KNN, and SVM) and a deep learning model (DF) were used. After multivariate data analysis, the MSC-NCA-DF model performed best, and the DF model still showed good classification ability in a small-scale sample set, so it was selected as the best classification model. Finally, based on the calibrated DF model, the classification results were visualized, which could more intuitively showed rice seeds with different degrees of frost damage, which was more helpful for actual production. In addition, this study can also provide reference for the future development of online detection system. Author contribution statement Liu Zhang: Conceptualization, Methodology, Software, Writing Original Draft. Heng Sun: Data Curation, Investigation. Zhenhong Rao: Resources, Validation. Haiyan Ji: Conceptualization, Supervision, Funding acquisition, Writing - Review & Editing. Declaration of competing interest All authors have no conflicts of interest. Acknowledgment This research is supported by National Key Research and Development Program (Project No.: 2016YFD0200602). References [1] M. Huang, J. Tang, B. Yang, Q. Zhu, Classification of maize seeds of different years based on hyperspectral imaging and model updating, Comput. Electron. Agric. 122 (2016) 139–145. [2] P. Armstrong, E. Maghirang, M. Ozulu, Determining damage levels in wheat caused by Sunn pest (Eurygaster integriceps) using visible and near-infrared spectroscopy, J. Cereal Sci. 86 (2019) 102–107. [3] L.M. Kandpal, S. Lohumi, M.S. Kim, J. Kang, B. Cho, Near-infrared hyperspectral imaging system coupled with multivariate methods to predict viability and vigor in muskmelon seeds, Sensors Actuators B Chem. 229 (2016) 534–544. [4] Z. Qiu, J. Chen, Y. Zhao, S. Zhu, Y. He, C. Zhang, Variety identification of single rice seed using hyperspectral imaging combined with convolutional neural network, Appl. Sci. 8 (2) (2018) 212.

[5] G. ElMasry, N. Mandour, S. Al-Rejaie, E. Belin, D. Rousseau, Recent applications of multispectral imaging in seed phenotyping and quality monitoring-An overview, Sensors-Basel 19 (5) (2019) 1090. [6] J.G.A. Barbedo, E.M. Guarienti, C.S. Tibola, Detection of sprout damage in wheat kernels using NIR hyperspectral imaging, Biosyst. Eng. 175 (2018) 124–132. [7] I. Baek, D. Kusumaningrum, L. Kandpal, S. Lohumi, C. Mo, M. Kim, B. Cho, Rapid measurement of soybean seed viability using kernel-based multispectral image analysis, Sensors-Basel 19 (2) (2019) 271. [8] S. Jia, L. Yang, D. An, Z. Liu, Y. Yan, S. Li, X. Zhang, D. Zhu, J. Gu, Feasibility of analyzing frost-damaged and non-viable maize kernels based on near infrared spectroscopy and chemometrics, J. Cereal Sci. 69 (2016) 145–150. [9] P. Shatadal, J. Tan, Identifying damaged soybeans by color image analysis, Appl. Eng. Agric. 19 (1) (2003) 65–69. [10] L. Esteve Agelet, D.D. Ellis, S. Duvick, A.S. Goggi, C.R. Hurburgh, C.A. Gardner, Feasibility of near infrared spectroscopy for analyzing corn kernel damage and viability of soybean and corn kernels, J. Cereal Sci. 55 (2) (2012) 160–165. [11] P. Vermeulen, M. Suman, J.A. Fernández Pierna, V. Baeten, Discrimination between durum and common wheat kernels using near infrared hyperspectral imaging, J. Cereal Sci. 84 (2018) 74–82. [12] C. Wakholi, L.M. Kandpal, H. Lee, H. Bae, E. Park, M.S. Kim, C. Mo, W. Lee, B. Cho, Rapid assessment of corn seed viability using short wave infrared line-scan hyperspectral imaging and chemometrics, Sensors Actuators B Chem. 255 (2018) 498–507. [13] K. Sendin, P.J. Williams, M. Manley, Near infrared hyperspectral imaging in quality and safety evaluation of cereals, Crit. Rev. Food Sci. Nutr. 58 (4) (2018) 575–590. [14] X. Zhao, W. Wang, X. Ni, X. Chu, Y. Li, C. Sun, Evaluation of near-infrared hyperspectral imaging for detection of peanut and walnut powders in whole wheat flour, Appl. Sci. 8 (7) (2018) 1076. [15] N. Caporaso, M.B. Whitworth, I.D. Fisk, Near-infrared spectroscopy and hyperspectral imaging for non-destructive quality assessment of cereal grains, Appl. Spectrosc. Rev. 53 (8) (2018) 667–687. [16] E.M. Achata, E.S. Inguglia, C.A. Esquerre, B.K. Tiwari, C.P. O'Donnell, Evaluation of VisNIR hyperspectral imaging as a process analytical tool to classify brined pork samples and predict brining salt concentration, J. Food Eng. 246 (2019) 134–140. [17] J. Zhang, L. Dai, F. Cheng, Classification of frozen corn seeds using hyperspectral Vis/ NIR reflectance imaging, Molecules 24 (1) (2019) 149. [18] P. Nie, J. Zhang, X. Feng, C. Yu, Y. He, Classification of hybrid seeds using nearinfrared hyperspectral imaging technology combined with deep learning, Sensors Actuators B Chem. 296 (2019), 126630. [19] Y. Fan, C. Zhang, Z. Liu, Z. Qiu, Y. He, Cost-sensitive stacked sparse auto-encoder models to detect striped stem borer infestation on rice based on hyperspectral imaging, Knowl.-Based Syst. 168 (2019) 49–58. [20] L. Feng, S. Zhu, L. Zhou, Y. Zhao, Y. Bao, C. Zhang, Y. He, Detection of subtle bruises on winter jujube using hyperspectral imaging with pixel-wise deep learning method, IEEE Access 7 (2019) 64494–64505. [21] Y. Zhao, S. Zhu, C. Zhang, X. Feng, L. Feng, Y. He, Application of hyperspectral imaging and chemometrics for variety classification of maize seeds, RSC Adv. 8 (3) (2018) 1337–1345. [22] S. Zhu, L. Zhou, C. Zhang, Y. Bao, B. W, H. Chu, Y. Yu, Y. He, L. Feng, Identification of soybean varieties using hyperspectral imaging coupled with convolutional neural network, Sensors-Basel 19 (19) (2019) 4065. [23] X. Cao, L. Wen, Y. Ge, J. Zhao, L. Jiao, Rotation-based deep forest for hyperspectral imagery classification, IEEE Geosci. Remote Sens. Lett. 16 (7) (2019) 1105–1109. [24] L.V. Utkin, A.A. Meldo, A.V. Konstantinov, Deep forest as a framework for a new class of machine learning models, Natl. Sci. Rev. 6 (2) (2019) 186–187. [25] Z.-H. Zhou, J. Feng, Deep forest: towards an alternative to deep neural networks, Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI'17) 2017, pp. 3553–3559 , Melbourne, Australia. [26] Z.-H. Zhou, J. Feng, Deep forest, Natl. Sci. Rev. (2018) https://doi.org/10.1093/nsr/ nwy108. [27] S. Lohumi, C. Mo, J. Kang, S. Hong, B. Cho, Nondestructive evaluation for the viability of watermelon (Citrullus lanatus) weeds using Fourier transform near infrared spectroscopy, J. Biosyst. Eng. 38 (4) (2013) 312–317. [28] Y. Zhao, C. Zhang, S. Zhu, P. Gao, L. Feng, Y. He, Non-destructive and rapid variety discrimination and visualization of single grape seed using near-infrared hyperspectral imaging technique and multivariate analysis, Molecules 23 (6) (2018) 1352. [29] International Seed Testing Association, International Rules for Seed Testing 2018, International Seed Testing Association, Bassersdorf, Switzerland, 2018. [30] A. Miao, J. Zhuang, Y. Tang, Y. He, X. Chu, S. Luo, Hyperspectral image-based variety classification of waxy maize seeds by the t-SNE model and procrustes analysis, Sensors-Basel 18 (12) (2018) 4391. [31] L. Zhang, H. Ji, Identification of wheat grain in different states based on hyperspectral imaging technology, Spectrosc. Lett. 52 (6) (2019) 356–366. [32] B. Chu, K. Yu, Y. Zhao, Y. He, Development of noninvasive classification methods for different roasting degrees of coffee beans using hyperspectral imaging, SensorsBasel. 18 (4) (2018) 1259. [33] W. Yang, K. Wang, W. Zuo, Neighborhood component feature selection for highdimensional data, J. Comput. 7 (2012) 161–168. [34] S. Mahesh, D.S. Jayas, J. Paliwal, N.D.G. White, Hyperspectral imaging to classify and monitor quality of agricultural materials, J. Stored Prod. Res. 61 (2015) 17–26. [35] Y. Bao, C. Mi, N. Wu, F. Liu, Y. He, Rapid classification of wheat grain varieties using hyperspectral imaging and chemometrics, Appl. Sci. 9 (19) (2019) 4119. [36] L. Feng, S. Zhu, C. Zhang, Y. Bao, P. Gao, Y. He, Variety identification of raisins using near-infrared hyperspectral imaging, Molecules 23 (11) (2018) 2907.

L. Zhang et al. / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229 (2020) 117973 [37] Y. Yan, B. Chen, D. Zhu, Near Infrared Spectroscopy-principles, Technologies and Application, China Light Industry Press, Beijing, China, 2013 21–27. [38] Q. Zheng, H. Wang, H. Hong, J. Zhang, M. Nan, W. Wang, Z. Zhuang, Effects of freezing injury on the germination characteristics and inner ultrastructure of maize hybridize seeds, J. Gansu Agric. Univ. 45 (5) (2010) 35–39. [39] Y. Yu, H. Yu, L. Guo, J. Li, Y. Chu, Y. Tang, S. Tang, F. Wang, Accuracy and stability improvement in detecting Wuchang rice adulteration by piece-wise multiplicative scatter correction in the hyperspectral imaging system, Anal. Methods 10 (26) (2018) 3224–3231.

13

[40] N. Wu, Y. Zhang, R. Na, C. Mi, S. Zhu, Y. He, C. Zhang, Variety identification of oat seeds using hyperspectral imaging: investigating the representation ability of deep convolutional neural network, RSC Adv. 9 (22) (2019) 12635–12644. [41] K. Sendin, M. Manley, V. Baeten, J.A. Fernández Pierna, P.J. Williams, Near infrared hyperspectral imaging for white maize classification according to grading regulations, Food Anal. Methods 12 (7) (2019) 1612–1624.