Determination of insect infestation on stored rice by near infrared (NIR) spectroscopy

Determination of insect infestation on stored rice by near infrared (NIR) spectroscopy

Microchemical Journal 145 (2019) 252–258 Contents lists available at ScienceDirect Microchemical Journal journal homepage: www.elsevier.com/locate/m...

1MB Sizes 0 Downloads 50 Views

Microchemical Journal 145 (2019) 252–258

Contents lists available at ScienceDirect

Microchemical Journal journal homepage: www.elsevier.com/locate/microc

Determination of insect infestation on stored rice by near infrared (NIR) spectroscopy

T

Alessandra Biancolillo , Patrizia Firmani, Remo Bucci, Andrea Magrì, Federico Marini ⁎

Department of Chemistry, University of Rome “La Sapienza”, Rome, Italy

ARTICLE INFO

ABSTRACT

Keywords: Chemometrics Classification Partial least squares discriminant analysis (PLSDA) Soft independent modeling of class analogies (SIMCA) Near infrared (NIR) spectroscopy Infested rice

Among grains, rice is one of the most widely consumed cereals in the world; it represents a staple food in great part of Asia and Africa, and it is also broadly diffused in America and Europe. One of the main issues of storing rice is to protect it from animal attacks; in particular, it is prone to insect infestation. Despite all the attempts made to avoid it (developing new physical barriers, traps and repellants), often food pests manage to break into granary and parcels, contaminating stored commodities. As a consequence, possible infestations must be continuously checked by producers and/or retailers. Different methods have been developed to detect insects in stored commodities, and, despite some of them demonstrated to perform well, they present the substantial limitation of being destructive. This latter characteristic undoubtedly leads to an obvious loss of product (and consequently, of profit), affecting farmers, retailers, and, finally, consumers. For this reason, the aim of the present work is to develop a methodology for the identification of insect infestation in stored rice by NIR spectroscopy coupled with discriminant and modeling classification methods. In particular, among all the different pests possibly present in granaries, the focus has been on detection of the Indian-meal moth (Plodia interpunctella), because it is considered one of the most common infesting insects. Different samples of rice, both infested and edible, coming from different farmers located in six different Countries (Cambodia, India, Italy, Pakistan, Suriname and Thailand) have been analyzed by NIR spectroscopy. Consequently, two different classification methods, Partial Least Squares Discriminant Analysis (PLS-DA) and Soft Independent Modeling of Class Analogy (SIMCA) have been applied in order to distinguish among infested and edible samples. In particular, PLS-DA allows correctly classifying 95.6% of the edible 97.5% of the contaminated samples (on the external validation set), whereas the SIMCA model, built only for the category of non-contaminated individuals, resulted highly specific (about 97%) but poorly sensitive on the test specimens. This latter approach (SIMCA) provided better predictions (in particular, in terms of sensitivity) when separate individual models were built subdividing samples in agreement with their country of origin.

1. Introduction Cereals are among the first plants that have been domesticated by human populations; evidences testify some of them where already harvested during the Pre-Pottery Neolithic Era [1]. They are the main source of livelihood worldwide, since they are used for both human consumption and animal breeding [2]; consequently, preventing their waste means avoiding huge economic losses. Among grains, rice is one of the most consumed cereals; it represents a staple food for great part of Asian and African populations, and it is also widely consumed in the rest of the world. One advantage of this cereal is that, if properly stored, it can be deposited in granaries for a long time, without any risk of losing its characteristics and nutritional properties. The major issue



stored rice may be affected by is pest infestation. In fact, microorganisms and insects (e.g., Plodia interpunctella) can penetrate physical barriers making the product inedible. Pest infestation introduces undesirable foreign bodies in food commodities, such as excrements, carcasses, cast skins, webbing and other secretions [3]; consequently, it is not only important to prevent this phenomenon, but also to detect possible contaminations. How to preclude insects from penetrating barriers and breaking into granaries has been widely investigated, and, during the years, several different apparatuses have been developed: some devices, like pitfall, probe or light traps, physically entangle the insects, preventing them to reach commodities [4,5]; instead, other ones exploit biochemistry in order to attract and then ensnare the infesting animals. For instance, the

Corresponding author at: Department of Chemistry, University of Rome La Sapienza, P.le Aldo Moro 5, I-00185 Rome, Italy. E-mail address: [email protected] (A. Biancolillo).

https://doi.org/10.1016/j.microc.2018.10.049 Received 8 August 2018; Received in revised form 22 October 2018; Accepted 22 October 2018 Available online 23 October 2018 0026-265X/ © 2018 Elsevier B.V. All rights reserved.

Microchemical Journal 145 (2019) 252–258

A. Biancolillo et al.

apparatus proposed in [6] presents an adhesive surface imbued with insects' pheromones: the animal is spurred to approach the device and then it gets stuck on the gluey layer. As above-mentioned, despite the application of different precautions, sometimes it is not possible to avoid infestations. At that point, the disposal of methodologies which allow detecting the presence of foreign bodies (and quantifying their amounts) into commodities becomes fundamental. Some physical properties can be applied as indices of the presence of insects: for instance, it has been demonstrated that, an increase in temperature [7] or variation in electrical conductance [8] can be measured and used to detect infestation. Another physical alternative has been proposed in [9], where acoustical detection has been applied for the same aim. Nevertheless, despite they present several benefits, these approaches are not thoroughly reliable; consequently, additional alternatives have been investigated and proposed. Further solutions for this problem have been provided by means of chemical techniques. Some of the proposed methodologies are based on quantification of insects by-products: this can be done by detection of carbon dioxide [10], or quantification of uric acid by High Performance Liquid Chromatography (HPLC) [11]. Additional alternatives involve bio-chemical/chemical approaches; for instance, Schatzh et al. quantified the myosin present in wheat kernel extracts by enzyme-linked immunosorbent assays [12] and they demonstrated it is relatable to the presence of wheat weevil (Sitophilus granarius). Another example could be the study published by Hambers et al. who managed to detect a grain weevil by Nuclear Magnetic Resonance Spectroscopy [13] with ah high accuracy, but only after insects have entered the third larval instars [13]. Although these analytical approaches led to good achievements, they present a relevant drawback from the market point of view: they are destructive. This definitely represents an issue considering the commercial value of cereals. For this, and all the above-mentioned reasons, the present study was set up in order to realize a non-destructive (and, possibly, rapid and relatively economic) approach to detect the presence of Indian-meal moth (Plodia interpunctella) in stored rice. Consequently, in order to reach this goal, it was deemed reasonable to test whether this aim could be achieved by Near Infrared Spectroscopy (NIR). Several papers on the application of this technique (NIR) on infested foodstuffs are present in the literature. For example, NIR spectroscopy has been satisfactory applied on chestnuts [14], olives [15], wheat flour [16], and jujubes [17,18] in order to detect the presence of insects. Near Infrared Spectroscopy has been applied also on rice, mainly to quantify the presence of weevils (in order to dose fumigants); for instance, in [19], Thai Hommali Rice has been analyzed by NIR and then Singular Value Decomposition and Partial Least Square (PLS) [20,21] were used for quantification. As above-mentioned, in the present work, the focus is on a particular insect, the Plodia interpunctella, commonly known as Indian-meal moth [22]. The choice of focusing on this specific insect is dictated by the fact that it is considered one of the major pests in stored cereals [23]. In fact, it is present in all continents except Antartica [22], and it can infest grains, several different nuts, fruits and some sweet food products [24]. Since the problem to be addressed in the present study is not the quantification, but the detection of the presence of the Indian-meal moth in stored rice, NIR spectroscopy has been coupled with chemometric classification techniques. In particular, the possibility of using both discriminant and class-modeling approaches, based on the use of partial least squares discriminant analysis (PLS-DA) [25,26] and soft independent modeling of class analogy (SIMCA) [27,28], respectively, in order to distinguish between infested and not-infested (edible) rice samples, has been explored. Classification methods are widely applied to check food quality [29], and, in particular, the proposed approaches (i.e., NIR coupled with PLS-DA or SIMCA) have been chosen because they have been satisfactory applied for quality assessment of plant products [30].

Table 1 – Description of the data set (number of parcels, measured grains and origin of rice samples). Bold values represent the total number of parcels and measured grains for infested and edible samples. Infested

Suitable for human consumption (edible)

Origin

# Parcels

# Measured grains

Origin

# Parcels

# Measured grains

Cambodia India Italy Pakistan Suriname Total

2 10 2 7 2 23

80 406 80 280 80 926

Cambodia India Pakistan Thailand

1 6 5 3

40 239 200 120

Total

15

599

2. Material and methods 2.1. Data set 38 parcels of rice (either brown, semi-milled, or wholly milled) provided by different farmers and coming from six different Nations (Cambodia, India, Italy, Pakistan, Suriname and Thailand) have been analyzed by NIR spectroscopy. Of these, 23 were infested by storage pests, while 15 were suitable for human consumption. In particular, in the case of Cambodia, India and Pakistan, both edible and infested parcels spanned the same variability in terms of geographical origin (sub-regions) and varieties: in some cases, edible and infested samples even came from different batches of the same lot. Individual grains have been scanned for a total of 1525 spectra (the operating procedure followed is described in Section 2.2); more details on the sample distribution can be found in Table 1. 2.2. Instrumental analysis – near infrared spectroscopy (NIR) The acquisition of Near Infrared (NIR) measurements was carried out using a Nicolet 6700 FT-NIR instrument (Thermo Scientific Inc., Madison, WI) equipped with a halogen tungsten lamp and an InGaAs detector. Spectra have been collected in reflectance mode directly on the whole rice grains, by means of an integrating sphere (Thermo Scientific Inc., Madison, WI). Each spectrum has been acquired placing a single rice grain on the optical window of the sphere and accumulating 82 scans, in the range between 10,000 and 4000 cm−1 (nominal resolution 4 cm−1). Eventually, the NIR raw spectra were exported using the OMNIC software (Thermo Scientific Inc., Madison, WI) and processed in Matlab (v.9.3, R2017b; The Mathworks, Natick, MA) using in-house written functions. 2.3. Partial least square-discriminant analysis (PLS-DA) Discriminant classification methods focus on the diversity among samples belonging to different categories and operate by building decision surfaces which provide univocal assignments. Among all the discriminant classification methods present in literature, Partial Least Square-Discriminant Analysis (PLS-DA) is one of the most widely applied, in particular handling data blocks having many (and) correlated variables (i.e., spectroscopic data). PLS-DA can be seen as an extension of the PLS algorithm, were the classification issue is transformed into a regression problem [25,26]. This is achieved by means of a Dummy Y, a binary matrix which, in the case of a two-class problem such as the one addressed in the present study, becomes a N × 1 vector (N being the total number of samples) encoding the class information. In particular, for a two-classes problem, the dummy vector y will present 1s in all the rows corresponding to samples belonging to Class 1 and 0 s in the remaining rows (corresponding to Class 2). Briefly, a regular PLS model is calculated, fitting the dummy y to the data matrix X: 253

Microchemical Journal 145 (2019) 252–258

A. Biancolillo et al.

(1)

y = Xb + e

where b and e are the vectors of regression coefficients and residuals, respectively. The peculiarity of the PLS algorithm is that the general linear regression model in Eq. (1) is calculated by projecting the samples onto a reduced space of scores T having maximum covariance with the response, and regressing y onto these scores. Since PLS is a regression method, differently than their target values (encoded in the dummy y), the predicted responses ( y ) will not be binary, but real-valued, so that a rule is needed to classify the samples. This is usually accomplished by setting a threshold to the predicted responses, which in the most naïve approach is 0.5, but can be set to different values based on probabilistic considerations. Accordingly, if for a sample the predicted response y falls above the threshold, that individual will be assigned to Class 1, while if it falls below, the sample will be assigned to Class 2. 2.4. Soft independent modeling of class analogies (SIMCA) Differently from discriminant classification methods, class-modeling approaches focus on the analogies among samples belonging to the same category rather than on the dissimilarities among objects from different classes [31]. In class-modeling, each category is individually modeled, so that one could in principle decide to model only the class (es) of interest. For each category, the classification problem is formulated in terms which closely resemble outlier detection: depending on a method-specific criterion, it is verified whether a sample is fitted well by the class model (“accepted”) or not (“rejected”). As a consequence, if more than one category is modeled, each sample can be accepted by only one class (and be univocally assigned it), by more than one class (confused sample) or it can be rejected from all the categories. In order to take into account all these circumstances, classification results from class-modeling approaches are usually reported in terms of sensitivity and specificity, which are defined per category. In particular, given a modeled class, sensitivity corresponds to the percentage of samples truly coming from that particular category which are correctly accepted by the model, whereas specificity is the percentage of samples belonging to other groups which are correctly rejected by the model. In the class-modeling context, Soft Independent Modeling of Class Analogies (SIMCA) [27,28,32,33] is one of the most widely used methods. It is based on the idea that the main systematic traits of each class can be summarized by a principal component model, so that acceptance/rejection of samples relies on the tools normally used in PCA for outlier detection. In this regard, different criteria have been proposed in the literature; in the present work, samples have been classified on the basis of their distance (d) from the class-model, calculated taking into account the Mahalanobis distance of samples from the center of the score space (T2) and their orthogonal distance from their bilinear projection (estimated as the sum of squares of residuals, Q) [34]. Mathematically, for the i-th sample, this can be expressed as:

di =

(Ti2, red ) 2 + (Qi, red ) 2 = 2

Ti2 2 T0.95

2

Qi + Q0.95

Fig. 1. – Raw reflectance spectra collected on all the samples after transformation to pseudo-absorbance.

and 607 of “infested” rice) and a test set of 500 samples (181 of “edible” rice and 319 of “infested” rice) by the Duplex algorithm [35]. In this way, the first group of objects is used for the optimization of model parameters (data pretreatment and the number of latent variables to be used) while the second one allows evaluating the predictive ability of the classification models. In both PLS-DA and SIMCA analysis, the most suitable preprocessing approach and the optimal complexity to be used for model building are defined on the basis of the results of 7-fold crossvalidation. Since spectroscopic data may be affected by undesired phenomena such as scattering or, in general, by spurious sources of variability which result in additive or multiplicative effects on the signals, prior to any chemometric modeling it is in general necessary to preprocess the spectra, in order to remove as much as possible of these unwanted contributions. To this purpose, different signal pretreatments have been tested on the training samples, in order to choose the most effective strategy: in particular, in the present study, the possibility of using first or second derivative (calculated according to the method of Savitzky and Golay [36], with a window of 19 data points and a second or third order interpolating polynomial, respectively), standard normal variate (SNV) [37], or their combinations was investigated. Moreover, after the application of any of the spectral treatments described above, each spectrum was also mean centered. 3.1. PLS-DA As anticipated, in order to build the classification models for the identification of rice contamination by food insects, both a discriminant and a modeling approaches were followed. At first, a discriminant classification model was built using the partial least squares-discriminant analysis (PLS-DA) technique. In this context, the best combination of spectral preprocessing and model complexity (i.e., number of latent variables) was chosen as the one leading to the lowest classification error in a 7-fold Cross Validation procedure. The results of the model selection phase are reported in Table 2, which collects, for each tested data pretreatment, the optimal number of latent variables (LVs) and the correct classification rate in cross-validation (per category). Looking at the table, it is clear that the 1st derivative is the most appropriate pretreatment for the data under study. Consequently, the final PLS-DA model was built on the training data, pretreated by first derivative and mean centering, and included 9 latent variables, resulting in 98.32% correct classification for the class “Edible” and 99.18% for the class “Infested”.

2

(2) Ti2

Where T0.95 and Q0.95 indicate the 95th percentiles of the and Qi distributions under their null hypotheses, respectively. All samples having a d ≤ 2 are accepted by the model, otherwise they are rejected. 3. Results and discussion As mentioned above, a total of 1525 rice grains were analyzed by NIR spectroscopy in reflectance mode. The corresponding signals were exported, transformed to pseudo-absorbance scale and organized in a matrix of dimensions 1525 × 3112 for the successive chemometric data processing. The complete data set is graphically displayed in Fig. 1. In order to externally validate the predictive models, measurements have been divided in a training set of 1025 samples (418 of “edible” rice 254

Microchemical Journal 145 (2019) 252–258

A. Biancolillo et al.

inspection can still provide some information about which portions of the measured spectra are, on average, more intense for the Edible samples (positive regression coefficients, red segments in the Figure) and which other are more intense for the Infested ones (negative regression coefficients, blue segments in the Figure). From the observation of the Figure, it is clear that, when the aim is distinguishing the contaminated rice from the edible one, the most relevant spectral variables are those associated to the first overtone of the CeH stretching from amino-acids side-chain (between 5720 cm−1 and 6211 cm−1), those in the range between 4600 cm−1 and 4310 cm−1 given by combinations of amide and side-chain vibrations, and those related to the first overtone of the OeH stretching and to the CeH bonds of fatty acid between 7000 cm−1 and 7300 cm−1 [41,42].

Table 2 – PLS-DA model selection: cross-validated classification errors and optimal model complexity as a function of the different spectral pre-treatments. Pretreatment

LVs

% Correct classification (CV) “edible”

% Correct classification (CV) “infested”

Mean centering (M.C.) 1st derivative (+M.C.) 2nd derivative (+M.C.) SNV (+M.C.) SNV + 1st derivative (+M.C.) SNV + 2nd derivative (+M.C.)

3 9 10 7 7

82.8 95.2 85.6 93.1 90.7

62.6 95.1 81.5 90.3 93.2

10

85.9

84.7

3.2. SIMCA

When the optimal model was applied to the external test samples, for the validation phase, the predicted responses graphically reported in Fig. 2 were obtained. It is possible to see from the Figure that the two categories are well discriminated also in the case of the validation samples, confirming the very good classification ability of the model. In particular, 16 test samples out of 500 (8 samples per class) were misclassified, corresponding to a correct classification rate of 95.6% for class “Edible” and 97.5% for class “Infested”. Despite the aim of the present work is to develop a methodology which would allow discriminating between infested and edible rice parcels, and, consequently, its focus is mainly on predictions, it could also be useful to investigate and interpret the model parameters, in order to try to understand, from a chemical standpoint, which are the spectral regions which are mostly affected by the infestation process. For this reason, after the creation of the PLS-DA model, variable importance in projection (VIP) indices [38] were calculated and inspected. These values indicate which variables influence the most the model: each spectral feature presenting a VIP index higher than a specific threshold is considered relevant from the model building point of view. Despite some alternatives have been proposed in literature, the cut-off value is usually set to 1 (which is, by construction, the mean of all the VIP values); as a consequence, variables providing a VIP index higher than this threshold, are those which influence the most the classification model. In the present case, VIP indices were calculated for each of the 3112 NIR spectral variables and the results are graphically displayed in Fig. 3, where the mean NIR spectrum (black solid line) is shown and the variables selected as relevant are highlighted in red or in blue, depending on the sign of the associated regression coefficient. Indeed, although care should be taken in interpreting the regression coefficients as their magnitude and sign could be influenced also by the presence of overlapping (non-orthogonal) signals [39,40], their

The aim of the present work is to propose a rapid and non-destructive methodology which allows detecting contamination, in particular by food pests, in stored rice. In the classification framework, this can be seen as an asymmetric classification problem, where the main interest is on modeling a specific well-defined category, the “Edible” class, and checking whether any new sample could fall within this category or, otherwise, be labeled as an outlier for it. Starting from this assumption, in a second stage of the present research, the possibility of verifying whether if could be possible to identify rice contamination by food pests using class modeling strategy, in particular, by means of the SIMCA algorithm, was also tested. As already stated, given the nature of the problem, only one category (i.e., the class of Edible samples) was modeled by SIMCA. Also in this case, a preliminary model selection phase was necessary in order to verify which were the most suitable preprocessing approach and the optimal model complexity. In particular, the optimal model was chosen as the combination of spectral preprocessing and number of principal components leading to the maximum efficiency in cross-validation. Efficiency is defined as the geometric mean of sensitivity (% of samples truly coming from the category correctly accepted by the model) and specificity (% of samples truly coming from other classes correctly rejected by the model). The results of the model selection phase are summarized in Table 3. Looking at the Table, it can be seen that, when mean centering or SNV + mean centering are used to pretreat the data, the resulting models show similar efficiency (88.95% and 88.65%, respectively), but they require a different number of principal components: 13 and 11, respectively. Therefore, in choosing which combination of preprocessing and model complexity to use for calculating the final model, the one requiring the most parsimonious number of principal components and resulting in the highest specificity has been considered preferable.

Fig. 2. – PLS-DA analysis: graphical representation of the predicted response for the test set samples. Legend: red circles – “Edible”, blue squares – “Infested”; the dashed line indicates the decision threshold. 255

Microchemical Journal 145 (2019) 252–258

A. Biancolillo et al.

Fig. 3. – PLS-DA analysis: graphical representation of significant variables according to their value of VIP index: in red, regions of variables having VIP index > 1 are highlighted by a thicker line and colored in red or blue depending on whether the values of their regression coefficients are positive or negative, respectively. The black solid line represents the average NIR spectrum of all training samples. (the reader is addressed to the online version for colors).

data set, the best combination of spectral preprocessing and model complexity was selected as the one leading to the highest efficiency for the class of interest in 7-fold cross-validation. For each of the three countries considered, the optimal model was then built using the best combination of preprocessing and number of principal components and validated on the external test set. The results of this further SIMCA analysis are reported in Table 4: in particular, also in this case, all the preprocessing approaches already described, were tested, but only the results obtained by the best models are reported in Table 4. It is evident from the Table that, when considering more homogeneous data sets, such as the one which correspond to rice grains coming from the same geographical origin, better results are obtained. Not only the specificity is very high for all the three models (93.8% for India, 99.0% for Pakistan and 100.0% for Cambodia), but also the sensitivity significantly increases and, in the case of Indian samples, becomes higher than 91%. The results of SIMCA modeling for the three subsets corresponding to the Cambodian, Indian and Pakistan samples are also graphically displayed in Fig. 4b–d, where the projection of the individuals on the model space defined by the values of Tred2 and Qred are shown. With respect to the model built on the full data set (Fig. 4a), it is evident how more samples (especially from the validation set) fall within the model space delimited by the dashed line.

Table 3 – SIMCA model selection: cross-validated classification errors and optimal model complexity as a function of the different spectral pre-treatments. Pretreatment

PCs

% Sensitivity (CV)

% Specificity (CV)

% Efficiency (CV)

Mean centering (M.C.) 1st derivative (+M.C.) 2nd derivative (+M.C.) SNV (+M.C.) SNV + 1st derivative (+M.C.) SNV + 2nd derivative (+M.C.)

13

91.1

86.8

88.9

8

86.6

70.3

78.4

8

83.0

76.7

79.8

11 9

89.2 88.5

88.1 75.8

88.6 82.1

9

85.4

72.0

78.7

Accordingly, the final SIMCA model was built after preprocessing the training data by SNV + mean centering, and retaining 11 principal components, and it resulted in 89.23% sensitivity and 88.14% specificity on the same individuals. When the model was applied to the test samples in order to be validated, it resulted in very high specificity (96.87%), but it showed a quite poor sensitivity (59.12%), definitely lower than the one measured on the training set. These results can also be graphically visualized in Fig. 4a, where the projection of the training and test samples onto the model space of the “Edible” class, is represented. It is evident from the Figure how almost half of the test samples from the Edible class fall outside the class space delimited by the dashed line and are, therefore, rejected by the model. One possible explanation for the observed results is that, due to the heterogeneity in the samples (which were collected from different countries), the choice of favoring a model which presented the highest specificity could result in penalizing its specificity, especially in predicting new samples. In order to test such a hypothesis, in the last part of the study, the SIMCA analysis was repeated separately on subsets of data corresponding to the different geographical origins, each time modeling the Edible class only. In particular, although Edible samples were available for 4 different geographical origins (Cambodia, India, Pakistan and Thailand, see Table 1) and only the samples from the category of interest are needed to build a SIMCA model, in order to be able to check both the sensitivity and the specificity of the “single geographical origin” models, only the three countries for which contaminated samples were also present (Cambodia, India and Pakistan) were considered for this further analysis. Analogously to what already did for the model built on the full

4. Conclusions The aim of the present work was to develop a non-destructive methodology for the detection of possible infestations of Indian-meal moth (Plodia interpunctella) in stored rice. For this reason, contaminated and edible rice samples (coming from six different geographical origins) have been measured by NIR spectroscopy and then classification models have been built in order to solve a two-class (“Infested” or “Edible”) problem. In particular, two different classification methods have been applied: PLS-DA and SIMCA. The discriminant approach has achieved a correct classification rate on the test set of 95.6% for the “Edible” class and 97.5% for the “Infested” class, allowing the identification of almost all the contaminated samples (8 misclassified over 319 infested samples). On the other hand, SIMCA analysis on the edible class has highlighted that modeling all the samples together, independently of their origin, would not be the most suitable solution. In fact, when SIMCA is applied on the entire set of available spectra, both sensitivity and specificity are quite high (slightly lower than those obtained from the PLS-DA model) on the training set, but on the test set the sensitivity becomes poorer (slightly higher than 59%). Diversely, when the analysis is performed handling sub-sets of measures, divided according to the origin of rice parcels, the specificity remains very high (being 99 or 100% on the validation samples from Pakistan and 256

Microchemical Journal 145 (2019) 252–258

A. Biancolillo et al.

Fig. 4. – SIMCA analysis: projection of the training (empty symbols) and test (full symbols) samples onto the Tred2 vs. Qred model space of the Edible class. (a) Model built on all the training samples; (b) model built on the Cambodian samples only; (c) model built on the Indian samples only; (d) model built on the Pakistan samples only. Legend: red circles – Edible, blue squares – Infested. Table 4 – Individual SIMCA modeling of the edible class for the different geographical origins: optimal spectral pre-treatments and model complexity and classification results. Origin

Pretreatment

PCs

% Sensitivity (training)

% Specificity (training)

% Sensitivity (CV)

% Specificity (CV)

% Sensitivity (test)

% Specificity (test)

Cambodia India Pakistan

SNV (+M.C.) SNV (+M.C.) SNV (+M.C.)

3 10 10

95.2 88.7 92.1

94.0 90.6 95.5

90.5 85.3 88.7

95.4 91.1 95.8

73.7 91.9 75.5

100.0 93.8 99.0

origin or variety is. This last characteristic is very powerful in the light of the possible use of the proposed method as a routine screening strategy. In conclusion, the results of the present study are very promising with a view to developing a non-destructive methodology for the identification of insect infestation in stored rice combining NIR spectroscopy and chemometric classification approaches.

Cambodia, respectively), suggesting that this methodology would allow detecting all the infested parcels of rice, avoiding that contaminated rice reaches the consumers, and, at the same time, a better sensitivity is obtained, resulting in a significantly lower number of false positives (and, therefore in less good rice grains which would be discarded as contaminated). With respect to similar approaches reported in the literature for the detection of insect infestation (which anyway, as detailed in the introduction, mostly dealt with other food matrices), the present study has the strength of relying on a large data set, where as much as possible of the expected sample to sample variability (in terms of geographical or varietal origin) has been accounted for through a careful sampling strategy. Having considered such sources of variability both for training and validating the models, if for SIMCA (in which one would desire to have classes as homogeneous as possible) translated to an unsatisfactory specificity when all the edible samples were modeled as a single category, in PLS-DA, on the other hand, resulted anyway in an almost perfect classification, indicating that the approach is able to recognize infected rice grains irrespectively of what their country of

Acknowledgements We would like to thank Italian Customs and Monopolies Agency (Agenzia delle Dogane e dei Monopoli) for providing the rice samples. References [1] G. Willcox, Evidence for plant exploitation and vegetation history from three Early Neolithic pre-pottery sites on the Euphrates (Syria), Veg. Hist. Archaeobotany 5 (1996) 143–152. [2] J. Bruinsma, Prospects for aggregate agriculture and major commodity groups, in: J. Bruinsma (Ed.), World Agriculture: Towards 2015/2030 an FAO Perspective, first

257

Microchemical Journal 145 (2019) 252–258

A. Biancolillo et al.

(2014) 14500011. [22] D. Rees (Ed.), Insects of Stored Products, second ed., CSIRO Publishing, Collingwood, 2004. [23] S. Mohandass, F.H. Arthur, K.Y. Zhu, J.E. Throne, Biology and management of Plodia interpunctella (Lepidoptera: Pyralidae) in stored products, J. Stored Prod. Res. 43 (2007) 302–311. [24] J.C. Hamlin, W.D. Reed, M.E. Phillips, Biology of the Indian meal moth on dried fruits in California, USDA Technical Bulletin 242 (1931) 1–27. [25] M. Barker, W. Rayens, Partial least squares for discrimination, J. Chemom. 17 (2003) 166–173. [26] S. Wold, H. Martens, H. Wold, The multivariate calibration problem in chemistry solved by the PLS method, in: A. Ruhe, B. Kågström (Eds.), Proceedings of the Conference on Matrix Pencils. Lecture Notes in Mathematics, Springer Verlag, Heidelberg, Germany, 1983, pp. 286–293. [27] S. Wold, Pattern recognition by means of disjoint principal components models, Pattern Recogn. 8 (1976) 127–139. [28] S. Wold, M. Sjöström, SIMCA: a method for analyzing chemical data in terms of similarity and analogy, in: B.R. Kowalski (Ed.), Chemometrics: Theory and Application, 52 American Chemical Society, Washington DC, 1977, pp. 243–282. [29] M. Bevilacqua, R. Nescatelli, R. Bucci, A.D. Magrì, A.L. Magrì, F. Marini, Chemometric classification techniques as a tool for solving problems in analytical chemistry, J. AOAC Int. 97 (2014) 19–28. [30] A. Biancolillo, F. Marini, Chapter four — chemometrics applied to plant spectral analysis, in: J. Lopes, C. Sousa (Eds.), Vibrational Spectroscopy for Plant Varieties and Cultivars Characterization, Comprehensive Analytical Chemistry, 80 Elsevier, Amsterdam, 2018, pp. 69–104. [31] S. De Luca, R. Bucci, A.D. Magrì, F. Marini, Class modeling techniques in chemometrics, in: R. Myers (Ed.), Encyclopedia of Analytical Chemistry, John Wiley and Sons, New York, NY, 2018, , https://doi.org/10.1002/9780470027318.a9578. [32] M. Forina, P. Oliveri, S. Lanteri, M. Casale, Class-modeling techniques, classic and new, for old and new problems, Chemom. Intell. Lab. Syst. 93 (2008) 132–148. [33] P. Oliveri, G. Downey, Multivariate class modeling for the verification of food-authenticity claims, Trends Anal. Chem. 35 (2012) 74–86. [34] H.H. Yue, S.J. Qin, Reconstruction-based fault identification based on a combined index, Ind. Eng. Chem. Res. 40 (2001) 4403–4414. [35] R.D. Snee, Validation of regression models: methods and examples, Technometrics 19 (1977) 415–428. [36] A. Savitzky, M.J.E. Golay, Smoothing and differentiation of data by simplified least squares procedures, Anal. Chem. 36 (1964) 1627–1639. [37] R.J. Barnes, M.S. Dhanoa, S.J. Lister, Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra, Appl. Spectrosc. 43 (1989) 772–777. [38] S. Wold, E. Johansson, M. Cocchi, PLS: partial least squares projections to latent structures, in: H. Kubinyi (Ed.), 3D QSAR in Drug Design: Theory, Methods and Applications, Escom Science Publishers, Leiden, The Netherlands, 1993, pp. 523–550. [39] M.B. Seasholtz, B.R. Kowalski, Qualitative information from multivariate calibration models, Appl. Spectrosc. 44 (1990) 1337–1348. [40] K. Kjeldahl, R. Bro, Some common misunderstandings in chemometrics, J. Chemom. 24 (2010) 558–564. [41] S. Wrang Bruun, I.B. Søndergaard, S. Jabobsen, Analysis of protein structures and interactions in complex food by near-infrared spectroscopy. 2. Hydrated gluten, J. Agric. Food Chem. 55 (2007) 7244–7251. [42] N. Prieto, R. Roehe, P. Lavín, G. Batten, S. Andrés, Application of near infrared reflectance spectroscopy to predict meat and meat products quality: a review, Meat Sci. 83 (2009) 175–186.

edition, Earthscan Publications Ltd, London, UK, 2003. [3] S. Rajendran, Detection of insect infestation in stored food commodities, J. Food Sci. Technol. 36 (1999) 283–300. [4] N.D.G. White, R.T. Arbogast, P.G. Fields, R.C. Hillmann, S.R. Loschiavo, B.h. Subramanyam, J.E. Throne, V.F. Wright, The development and use of pitfall and probe traps for capturing insects in stored grain, J. Kansas Entomol. Soc. 63 (1990) 506–525. [5] J.E. Harris, Insect light traps, in: J.W. Heeps (Ed.), Insect Management for Food Storage and Processing, second edition, AACC International, Minneapolis (MN), 2006. [6] K.W. Vick, R.W. Mankin, R.R. Cogburn, M. Mullen, J.E. Throne, V.F. Wright, L.D. Cline, Review of pheromone-baited sticky traps for detection of stored-product insects, J. Kansas Entomol. Soc. 63 (1990) 526–532. [7] R.T. Arbogast, Mortality and reproduction of Ephestia cautella and Plodia interpunctella exposed as pupae to high temperatures, Environ. Entomol. 10 (1981) 708–711. [8] T.C. Pearson, D.L. Brabec, C.R. Schwartz, Automated detection of internal insect infestations in whole wheat kernels using a PERTEN SKCS 4100, Appl. Eng. Agric. 19 (6) (2003) 727–733. [9] D.W. Hangstrum, P.W. Flinn, D. Shuman, Automated monitoring using acoustical sensors for insects in farm-stored wheat, J. Econ. Entomol. 89 (1996) 211–217. [10] W.A. Bruce, M.W. Streeg, R.C. Semper, D. Fulk, Detection of Hidden Insect Infestations in Wheat by Infrared Carbon Dioxide Gas Analysis, Agricultural Research Service (Southern Region), U.S. Department of Agriculture, New Orleans (LA), 1982. [11] R.L. Wehling, D.L. Wetzel, High-performance liquid chromatographic determination of low level uric acid in grains and cereal products as a measure of insect infestation, J. Chromatogr. 269 (1983) 191–197. [12] T.F. Schatzki, E.K. Wilson, G.B. Kitto, P. Behrens, I. Heller, Determination of hidden Sitophihn granarius (Coleoptera: Curculionidae) in wheat by myosin ELISA, J. Econ. Entomol. 86 (1993) 1584–1589. [13] J. Chambers, N.J. McKevitt, M.R. Stubbs, Nuclear magnetic resonance spectroscopy for studying the development and detection of the grain weevil, Sitophilus granarius (L.) (Coleoptera: Curculionidae), within wheat kernels, Bull. Entomol. Res. 74 (1984) 707–724. [14] R. Moscetti, R.P. Haff, S. Saranwong, D. Monarca, M. Cecchini, R. Massantini, Nondestructive detection of insect infested chestnuts based on NIR spectroscopy, Postharvest Biol. Technol. 87 (2014) 88–94. [15] R. Moscetti, R.P. Haff, E. Stella, M. Contini, D. Monarca, M. Cecchini, R. Massantini, Feasibility of NIR spectroscopy to detect olive fruit infested by Bactrocera oleae, Postharvest Biol. Technol. 99 (2015) 58–62. [16] J. Perez-Mendoza, J.E. Throne, F.E. Dowell, J.E. Baker, Detection of insect fragments in wheat flour by near-infrared spectroscopy, J. Stored Prod. Res. 39 (2003) 305–312. [17] J. Wang, N. Nakano, S. Ohashi, Nondestructive detection of internal insect infestation in jujubes using visible and near-infrared spectroscopy, Postharvest Biol. Technol. 59 (2011) 272–279. [18] J. Wang, N. Nakano, S. Ohashi, K. Takizawa, J.G. He, Comparison of different modes of visible and near-infrared spectroscopy for detecting internal insect infestation in jujubes, J. Food Eng. 101 (2010) 78–84. [19] P. Jarruwat, P. Choomjaihan, Applying singular value decomposition technique for quantifying the insects in commercial Thai Hommali Rice from NIR Spectrum, J. Innov. Opt. Health Sci. 10 (2017) 1650047. [20] P. Jarruwat, P. Choomjaihan, Applicability of near infrared spectroscopy for detecting post-fumigated weevils in packaged rice, J. N. I.R. S. 25 (2017) 72–81. [21] P. Jarruwat, P. Choomjaihan, Feasibility study on estimation of rice weevil quantity in rice stock using near-infrared spectroscopy technique, J. Innov. Opt. Health Sci. 7

258