C H A P T E R
10 Data Fusion Strategies in Food Analysis Alessandra Biancolillo*, Ricard Boque´x, Marina Cocchi{, Federico Marini*, 1 * Department of Chemistry, University of Rome “La Sapienza” Rome, Italy; x Universitat Rovira i Virgili, Department of Analytical Chemistry and Organic Chemistry, Campus Sescelades Tarragona, Spain; { Department of Chemical and Geological Sciences, University of Modena and Reggio Emilia, Modena, Italy 1 Corresponding author
1. INTRODUCTION Food is a complex matrix, being often composed of hundreds if not thousands of constituents at concentration levels ranging from percent to traces [1]. Owing to its essential role for nutrition and sustenance, and also for its hedonistic appreciation and commercial relevance, the quest for high-quality food products and the corresponding need for accurate methods to verify that all the quality aspects are actually met is increasingly growing [2]. Quality of a product, and in particular of a foodstuff, is a multifactorial property, being related at the same time to safety aspects, such as the absence of contamination by microorganisms or toxic substances or of alterations due to prolonged or incorrect storage, to genuinity (known chemical composition and absence of adulterants), to more hedonistic aspects such as a positive sensory evaluation, and, last but not least, to a traceable origin in terms of raw materials, geography, and type of processing undergone [3]. It is then evident that, as a result of complex composition of foodstuff and of the multifactorial nature of quality described earlier, the definition
Data Fusion Methodology and Applications https://doi.org/10.1016/B978-0-444-63984-4.00010-7
271
Copyright © 2019 Elsevier B.V. All rights reserved.
272
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
and verification/authentication of food quality require a holistic characterization of the product, which, in turn, needs to rely on the extensive application of one (or better more) analytical technique [4]. In this respect, the analytical paradigms seem to be progressively moving from the very accurate quantification of a few well-identified analytes or family of analytes (targeted analysis), which are a priori or based on the literature thought to be relevant for the problem, to the extensive (and often highthroughput) characterization of whole 1-D or even higher-dimensional instrumental profiles (untargeted analysis), with the idea that considering the whole instrumental signal would provide information about both known and unknown chemical constituents of the investigated matrix [5]. However, although as information rich as possible, the data (block of variables) resulting from a single set of measurements may not be enough to completely characterize a complex product such as a food and to provide a basis for the verification of its quality [6]. This may not be unexpected as even for ourselves, as humans, the appreciation of a food passes through the interplay of the outcomes of five different “instruments,” i.e., our senses, which guide us to identify which food is better than others, which one may be bad or even poisonous, if anything occurred that made it inedible, and so on. Similarly, the analytical characterization and verification of food quality may undoubtedly profit from the combination (fusion) of the results from multiple (instrumental, but not only, e.g., sensory) sources. In the context of food quality assessment, data fusion (DF) (or multiblock/multiset analysis) can result in several advantages over the processing of individual data matrices. First of all, the increase in information content brought by the additional data blocks can help reduce the impact of spurious sources of variability and/or potential interferents and, in the case of predictive modeling, achieving lower prediction errors. In order for this to happen, it is of course fundamental that the different sets of data carry (at least partly) complementary information. On the other hand, information about the redundancies (common features) and the unique (distinctive) sources of information among the block can shed a new light among the relationship between the analytes within different blocks and constitute the basis for a deep foodomic characterization [7]. The aim of the present chapter is to present general perspectives on DF and to briefly discuss the potentialities of this strategy in the food analysis context. To provide an overview on such a wide topic as multiblock analysis, the chapter is conceptually divided into two parts: one in which the subject is approached from a theoretical standpoint (Sections 1e3) and a more practical part in which selected applications of multiblock methods applied to authenticate or to check quality of foodstuff are described (Section 4). Throughout the text, general advantages and disadvantages of
2. CHEMOMETRIC STRATEGIES APPLIED IN DATA FUSION
273
DF strategies are depicted with a slight deeper attention into few specific methods. Despite this, providing technicalities about the algorithms and the theory behind the diverse discussed methodologies is outside the scope of the chapter; consequently, the reader is addressed to specific sections of this book or to the cited literature for further details.
2. CHEMOMETRIC STRATEGIES APPLIED IN DATA FUSION The growing need of DF methodologies has led to the development of several multiblock approaches suitable to handle omnifarious data sets and able to cope with issues that could arise in different data-analysis contexts. Obviously, although the DF approaches present in the literature are different from each other, it is possible to recognize similarities among them, and it is therefore possible to group these methodologies into categories, depending on the aspect one is interested in. For instance, the most common classification given on multiblock approaches is the one focused on the level DF takes place. According to this division, they can be organized into three categories: low-, mid-, and high-level DF (a scheme representing the core differences among the three approaches is reported in Fig. 10.1). The former category, low-level DF, comprehends all those methods in which the original data blocks are joint together before the extraction of information. Taking into account a multiblock set made of three data sheets (X 1ðNIÞ , X 2ðNLÞ , and X 3ðNMÞ ), the application of a low-level DF approach requires the concatenation of the three blocks into a unique matrix (XConc) obtained by row concatenation of the three data blocks (XConc ¼ [ ¼ [X1X2X3]) of dimensions N (I þ L þ M), and then the information is extracted by the resulting matrix XConc. A well-known method belonging to this class of approaches is multiblock partial least square (MB-PLS) [8e10]. Although different variations of the original algorithm have been proposed [11,12], in all cases, all the available blocks are concatenated, pretreated, and block-scaled to solve issues linked to the scale (more details about preprocessing in this context will be given later) and then the information is simultaneously extracted. Low-level approaches present several benefits; for instance, they are easy to use, because they do not require any feature extraction step at the beginning of the creation of the model, but they obviously also present some drawbacks. In fact, when applying these methodologies, a blockscaling step is required, aimed at avoiding that a data-block presenting higher variance (simply because it has a higher number of variables) than the other sets drives the model. Anyhow, although the original variables
274
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
FIGURE 10.1 General scheme depicting the main differences among low-, mid-, and high-level data fusion.
are used to create the model, it is likely probable that the algorithm involves a feature/variable selection procedure. When this is operated by methods based on the calculation of latent variables, an additional issue may arise. In fact, because the extraction of the information takes place on the concatenated global matrix, low-level approaches do not allow defining different number of components to be extracted from the different data blocks; consequently, this could be suitable for some of the available blocks, but it would likely correspond to an over/underestimation of the complexity for some others. Midlevel methodologies present a feature extraction step as a starting point. Consequently, the individual data blocks are independently elaborated and the information is extracted in the form of latent variables or selected features and then concatenated into a unique matrix. The final multiblock model is then calculated on this resulting matrix. An example of the midlevel approach is sequential and orthogonalized partial least squares-linear discriminant analysis (SO-PLS-LDA) [13], which exploits a multiblock regression method (SO-PLS) to extract information from the data blocks. Considering once again the case in which the three data blocks X1, X2, and X3 are available, SO-PLS [14]
2. CHEMOMETRIC STRATEGIES APPLIED IN DATA FUSION
275
provides a set of scores per block (T 1ðNAÞ , T 2ðNBÞ , and T 3ðNDÞ ), which are then concatenated (G ¼ [T1 T2 T 3]), obtaining the super-scores matrix G of dimensions N (A þ B þ D) and finally LDA is applied on G. The application of high-level DF approaches is based on a posteriori elaboration of results obtained by individual models calculated on each data block, which is the reason why this approach is also called decision fusion. Plainly, several classifiers (usually Bayesian) are used to solve the same problem and finally predictions are joint into a unique solution [15]. A wider description and a practical application of this strategy can be found in subsection 4.4.2 or in [16]. Although the above-mentioned way of dividing methods according to the level the information extracted is useful and intuitive, it does not always fit all the available methods. Moreover, one could be interested in whether the information from the different blocks are combined and modeled at the same time or separately. For this reason, multiblock methods could also be divided into sequential and parallel approaches. The former attribute is used for methodologies in which the information is subsequently extrapolated from the blocks in different steps of the algorithm. The main advantage of these methods is that the contribution each block gives to the model can be independently inspected, and, by application of some ploys, it is generally possible to remove redundancies among blocks. Moreover, whether feature reduction is present and provided by the extraction of latent variables, the sequential nature of these methods allows defining the optimal complexity individually for each block. The main drawback of sequential methods is that they could be a bit more time consuming than parallel approaches, because their algorithms, by construction, are expected to involve various steps. For instance, SO-PLS or sequential and orthogonalized-covariance selection [14,17] belongs to this group of methods. In both approaches, the information is (sequentially) extracted from the different predictor blocks, providing redundancies removal. On the other hand, applying parallel approaches, the information contained in the different data blocks is simultaneously modeled. Examples of these methodologies are parallel orthogonalized partial least squares regression (PO-PLS) [18,19], parallel independent component analysis (parallel-ICA) [20], and OnPLS [21]. The above-mentioned distinctions among DF strategies are based on the nature of the algorithm; nevertheless, another obvious way of “grasping” into the multiblock literature is by taking into account the final purpose of the method. In agreement with this, DF methodologies could be divided into predictive (i.e., regression or classification approaches) and explorative methods.
276
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
The first group, made of regression and classification multiblock strategies, is definitely highly populated and comprehends various kinds of methodologies. Some examples are the already cited MB-PLS [8e10], SO-PLS [14], PO-PLS [18,19], On-PLS [21], or further approaches such as sequential and orthogonalized N-partial least squares (SO-N-PLS) [22], multiblock redundancy analysis [23,24], predictive-ComDim [25], and hierarchical-PLS [26]. As mentioned, explorative DF methods aimed at solving component multiblock problems are present in the literature, and they mainly differ on how the information is extracted from the data set. Examples of methods belonging to this category are CCSWA [27], CPCA [28], DISCO [29], JIVE [30], Hierarchical PCA [26,31], and all the variants of ComDim [25]. Also, in this case, for the sake of brevity, here only few approaches are reported; the reader is highly suggested to enquire the literature for a wider overview.
3. BUILDING, OPTIMIZATION, AND VALIDATION OF DATA-FUSED MODELS Because most of the available modeling approaches for dealing with multiblock data set are covered in detail in other chapters of this book, in this paragraph only a brief discussion on how to build, validate, and interpret data-fused models will be presented. First, some considerations should be made about the data structure. In general, as it will also be evident from the collection of examples reported in the successive paragraphs, in the context of food analysis the multiset situation most frequently encountered is the one in which multiple blocks of descriptors are collected to characterize the same set of samples: in such configuration, the data matrices corresponding to the different blocks share the same sample (row) direction, whereas they normally present variables of different nature and number. On the other hand, problems involving the same set of variables measured on different samples, i.e., the configuration in which data matrices share the same column direction, are less commonly reported within food quality applications. Occasionally, there can also be cases in which some blocks share the row and some others the column direction, where directional (path) relationships among the blocks are sought. Once the data configuration is defined, matrices should be preprocessed before being subjected to the successive modeling stage. In this respect, although conventional preprocessing methods (i.e., those adopted in the case of single matrices) can and should always be used according to the same guidelines one would follow for 1-block modeling, when dealing with multiblock analysis, additional sources of spurious
3. BUILDING, OPTIMIZATION, AND VALIDATION
277
variance may emerge and more specific preprocessing strategies should also be adopted. Indeed, multiblock modeling could be implemented in a simultaneous or sequential fashion. When a simultaneous algorithm is chosen, each component (or feature in general) is calculated by combining the information from all the available blocks: because components are in general calculated so as to account for a relevant portion of variance (or covariance), blocks having the highest variance can drive toward themselves the extraction of the latent variables. To overcome this problem, the most frequently used strategy is, after having pretreated the blocks in the same way as they were modeled individually (e.g., by standard normal variate [32] and mean centering, or by autoscaling), to equalize their variances by dividing each matrix by its Frobenius’ norm (block scaling) [26,33]: such operation removes the spurious contribution to the overall variability due to the blocks having different variances. With sequential methods, in which blocks are modeled one after another so that features/ components are in turn extracted from one matrix at a time, this problem does not occur, so that the corresponding models are said to be scale invariant. Once the data are suitably preprocessed, they can undergo the desired modeling approach, which can be any of those summarized in Section 2. In this context, model selection plays a determinant role as multiblock methodologies often involve the extraction of a reduced number of components (or features) to describe the relevant information in the data. It is then necessary to decide what the optimal model complexity should be. Although in the case of simultaneous methods, where a single set of latent variables (or features) is extracted from all the blocks altogether, optimization of the model complexity occurs as in the case of a single block, for sequential methods two approaches are possible: sequential and global optimization [14]. In sequential optimization, the complexity of each block is optimized individually in a consecutive fashion: at first the optimal complexity of the first block is investigated; then, the optimal complexity of the successive block is progressively sought as the number of components that, in combination to those already selected from the previous matrices, leads to the minimum error (usually in crossvalidation). On the other hand, in global optimization, all the possible combinations of numbers of features/components per block are tested and that leading to the minimum error is chosen as the optimal one. As for the cases in which a single block of data is involved, variable selection can improve the quality of the models both in terms of predictive accuracy (when calibration or classification is involved) and of model interpretation. Indeed, in the case of multiblock data, the risk of including in the models noisy or uninformative variable is even higher than with single-data set analysis and the presence of irrelevant descriptors can seriously affect the quality of the models built and the resulting
278
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
predictions. On the other hand, when a reduced number of variables is selected from the different blocks, interpretation also becomes more straightforward. Moreover, in sequential models, by repeating variable selection after exchanging the order of the blocks, information can be obtained on which descriptors bring common information and which carry distinctive information. Here it must be stressed that the definition of how to select variables in a DF context is a complex and a not widely investigated topic. Some tentative indications have been provided in Ref. [34] where diverse features selection methods are applied on SO-PLS and MB-PLS models following different strategies. Similarly, Prieto and collaborators [35] discussed how to estimate VIP indices from On-PLS models. Recently, Biancolillo et al. [17] proposed a sequential multiblock variable selection approach based on the CovSel paradigm [36], which allows the identification and extraction of a highly parsimonious subset of predictors, with minimum redundancy. Finally, a few words need to be spent on model validation, to highlight some additional aspects, which integrate all the concepts already exploited for the analysis of individual blocks. Indeed, together with the generalization ability on unknown data, the identification of whether an appropriate model complexity was selected, and the stability and interpretability of model parameters, validation of multiblock models should also address issues that are specific to the multiset nature of the data. In this framework, a fundamental question the validation process should answer is whether all the blocks are relevant and necessary [37] or if any of those is completely redundant and/or may even damage model quality.
4. APPLICATIONS The applications of multiblock methods in food analysis are finalized at different aspects, mainly, the quality of products is tested, to ensure the consumer about the salubrity of the aliments and to avoid possible frauds. Often, food products present high added value because of the area they are harvested, finished, and/or produced (e.g., foodstuff labelled by the PDO mark); consequently, several analytical methodologies have been developed to authenticate and trace aliments. Considering the market value of these products, the analytical platforms applied in this context are mainly the nondestructive ones. As a consequence, spectroscopybased devices, sensors, and RGB or hyperspectral imaging are among the most commonly applied analytical techniques in food analysis; nevertheless, the standard quantification approaches (e.g., chromatography) are also still used for reference. For what concerns the chemometric tools used in food analysis [38], as mentioned earlier, the exploratory methods are mainly used to have an overview of the products, but when it comes to
4. APPLICATIONS
279
assess the quality of the aliments, it could be relevant, for instance, to quantify the amount of a specific compound (e.g., sugar in fruits or potassium in meat) or to authenticate a distinct product; in both cases, regression and classification approaches are definitely the required chemometric tools for these purposes. In the literature, it is possible to find several applications whereby a nondestructive analytical technique is combined with regression or classification approaches to assess the quality of a high-added-value product; for instance, this has been done to authenticate and trace protected designation of origin (PDO), protected geographical indication (PGI), and high-value products, such as olives [39], olive [40,41] and rosehip [42] oil, cereals [43], nuts [44], vinegar [45,46], tea [47], liquorice [48], and several others [49e54]. Nevertheless, the analysis pursued by only one analytical technique is sometimes not sufficient to obtain a complete characterization of a product; consequently, the synergic application of several analytical platforms could be necessary to achieve satisfactory results. Some examples are reported in the following paragraphs.
4.1 Olive Oil Extra virgin olive oil is a high-added-value food consumed all over the world but produced only in a few countries (mainly in the Mediterranean area). Additionally, olive oils produced in specific areas present quality labels as the PDO mark, further confirming their value. Owing to this, it is obvious that this matrix has been widely investigated and several analytical methodologies have been developed to check its quality. Among the numerous works published in the literature about this topic, several exploit DF strategies to handle such a complex matrix. For what concerns the analytical techniques used to assess the quality of olive oil in a multiblock context, they are diverse, both destructive or not. Considering the former category of approaches, liquid chromatography (LC) is one of the most used for quality assessment of olive oil. For instance, Bajoub and collaborators used this technique to assess the botanical origin of samples of extra virgin olive oil produced by fruits belonging to seven different varieties [55]. To achieve their goal, data were acquired using LC coupled with a diode array (DAD) and a fluorescence detector and then classified by low- and mid-level DF approaches. In particular, partial least squares discriminant analysis (PLS-DA) was used to classify original variables after row-augmentation or row-wise concatenated scores (extracted by the individual PLS-DA models). All models provided satisfactory results. A similar approach was pursued by Nescatelli et al. who used a mid-level DF approach based on PLS-DA to combine selected highperformance liquid chromatography (HPLC)-DAD variables and classify diverse PDO olive oil samples according to their origin [56].
280
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
Gas chromatography has also been used to distinguish extra virgin olive oils in a multiplatform approach; for instance, Casale et al. combined ultraviolet (UV)-visible (UV-Vis), near-infrared (NIR) and mid-infrared (MIR) spectroscopies with fatty acid composition quantified by gas chromatography (GC) to authenticate PDO olive oil Chianti Classico [57]. The four data blocks were investigated individually and by a low-level DF strategy; classification was achieved by unequal dispersed classes (UNEQ) [58] and soft independent modeling of class analogies (SIMCA); both the approaches provided satisfactory results. Spectroscopy has been widely used in this context, sometimes in combination with destructive techniques, as it is done by Pizarro et al., who discriminated Spanish olive oils by combining the five physicalchemical parameters required by the European Commission regulation (EEC) No. 2568/91 with UV-Vis measurements [59]. Classification was pursued by PLS-DA on fused data; good results were obtained on crossvalidation. Spectroscopy is also used in the work conducted by Borra`s and collaborators [60], who applied headspace-mass spectrometry, MIR spectroscopy, UV-Vis spectrophotometry, and sensory analysis to distinguish different qualities of olive oils. PLS-DA has been used on the individual data blocks and in both low- and mid-level DF strategies; the latter approach provided the most satisfactory results. A completely nondestructive approach aimed at authenticating a specific variety of PDO extra virgin olive oil has been proposed by Bevilacqua and coauthors; in Ref. [61], they analyzed by MIR and NIR spectroscopy several Italian oil samples and then applied PLS-DA in lowand mid-level DF strategies to authenticate PDO Sabina extra virgin olive oils. Results obtained by mid-level DF were extremely good, better than those obtained by individual block analysis or by low-level DF. Besides authentication, shelf life assessment is also investigated for extra virgin olive oils; for example, Buratti and collaborators applied nondestructive approaches, electronic tongue, nose, and eye combined with low-level DF, to track variations in olive oils stored at different conditions [62].
4.2 Wine Wine has been extensively characterized with respect to production processes, aging, as well as to detect adulteration, and assessing authenticity, both of varietal and geographical origin, with several analytical techniques ranging from stable elements isotopic ratio, spectroscopies, and chromatography to artificial sensors (e-nose, e-tongue). Regarding studies that include DF, the literature is less rich; however, it includes several applications [16,63e69]. Monakhova et al. [63] evaluate
4. APPLICATIONS
281
the synergistic effect of fusing 1H-nuclear magnetic resonance (NMR) spectroscopic and stable isotope (18O, 13C) data to assess the year of vintage, grape varieties, and geographical origin of wine samples from Germany. They obtained improved classification through DF, with respect to using a single data block, only for modeling geographical origin. Interestingly, they compared different approaches to fuse the data, common to all of them it has been the preliminary compression of the NMR data (preprocessed by bucketing and normalization) accomplished by variables selection. The isotopic data consisting of only five variables were used as such. When merging the two data sets, block scaling resulted to be the most promising scaling procedure. The DF process was conducted both by applying classical mid-level, i.e., merging the two blocks of data and then applying a classification method, and by using specialized multiblocks methods such as MB-PLS-DA, consensus PCA (CPCA), and ComDim (for CPCA and ComDim the Mahalanobis distance was used for assessing class membership in the space of their extracted components). Notwithstanding which multiblock method was used, the classification performance was found improved with respect to mid-level DF. A mid-level DF approach was applied by Roussel et al. [64] for authenticating wine obtained from different grape varieties. The features extraction step was accomplished by applying a genetic algorithm to each block of data, namely, e-nose sensors responses, Fourier-transform infrared spectroscopy (FTIR), and UV spectroscopic data. Additionally, the same authors proposed a high-level DF strategy to discriminate must samples from different white grape varieties [65]. In particular, because it was possible to directly calculate probability values from the models built on the separate blocks, a Bayesian approach was used to fuse the individual outcomes into a global decision about the most likely class belonging of the samples. Tao et al. [66] studied wine aging by fusing MIR and NIR spectroscopic data, as well with a mid-level approach, by comparing PCA and wavelet transform to compress the spectra. In both [64,66] PLS-DA was used as the discrimination method to classify varieties and aging categories, respectively. Prieto et al. [67] studied the storage time effect of different red wines by fusing information acquired by a sensor-based e-nose (compressed to six variables), GC-MS (relative quantification of five volatile compounds), and UV/Vis (six CIELab coordinates) by concatenation of the variables extracted from each data and arrangement in a three-way array (samples variables storage time points), which was then analyzed by Tucker3. The ensemble fusion methodologies described in Chapter 12 of this book has been applied Brownfield et al. [16] to classify Italian wines with respect to their geographical origin. In this case, only one data source has
282
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
been considered, composed of 13 quality variables, and the results obtained by 17 different nonoptimized classifiers were fused (highlevel DF). In the following paragraph, application of different DF approaches to model different aspects of wine authenticity will be illustrated as a case of study [38,68,69]. 4.2.1 Geographical Traceability and Varietal Characterization of Lambrusco Di Modena Wine In modeling authenticity with respect to geographical origin two main approaches can be used measuring direct indicators, i.e., variables that can be measured both in soils and food products trying to establish a direct link, such as mineral elements and stable elements isotopic ratios, or indirect indicators, by acquiring foodstuff fingerprint or compositional profile, which implicitly reflects the peculiarity of the production protocol and so may allow discrimination from food products of different geographic origin, even if not directly linked to the territory. In this case study, which was part of a long-term project dealing with the development of authenticity and geographical traceability models of PDO Italian wines [70], both analytical approaches have been undertaken. DF was effective in different stages of the project aimed at assessing authenticity models for Lambrusco of Modena PDO wines. 4.2.1.1 Aiding Soil Characterization for Optimal Sampling
To achieve the aim of establishing a correlation between the soils of origin and the final products, one of the main aspects to face is the representativeness of the sampling of the geographical area under investigation; on the other hand, direct indicators are quite costly to determine. As a preliminary step to aid representative sampling of the Modena district territory the content of several metals together with the isotopic abundance ratio 87Sr/86Sr was used jointly with the X-ray powder diffraction profile to obtain information about the inter- and intrasite variability, including depth, at few selected locations. DF in this case was undertaken at the exploratory level to evaluate the link among main mineralogical phases, metals, and isotopic abundance ratio 87Sr/86Sr to assess if a faster, nondestructive and less costly approach (the diffractograms) could be employed to assess the soil variability and hence to focus the sampling [68]. Fig. 10.2 (top left panel) illustrates the followed DF methodology, which consisted of two steps: (1) low-level fusion of metal content (46 variables) with 87Sr/86Sr ratio, namely, concatenation with a scaling procedure, which attributed 15% of the total variance to the isotopic ratio, and (2) mid-level fusion of the features extracted from each data block by using multivariate curve resolution (MCR) as features extraction method. Finally, exploratory analysis was conducted by MCR
4. APPLICATIONS
283
FIGURE 10.2 Data fusion strategies applied in Lambrusco of Modena wine benchmark. The top left panel illustrates the approach used in soil characterization, which is based on MCR as features extraction methods in mid-level data fusion; the top right panel refers to discriminating the different PDO varieties fusing at mid-level NMR, EEM, and HPLC-DAD data by using a proper compression/resolution method for each data block, namely, PCA (NMR), PARAFAC (EEM), and MCR (HPLC-DAD); the bottom panel shows the mid-level data fusion approach (after preliminary low-level concatenation of metal content and isotopic ratio data) applied to modeling authenticity, by SIMCA one-class modeling, of Lambrusco of Modena with respect to not-Lambrusco of Modena wines.
on the fused features, applying block scaling to give the same variance to each block. Most often, in DF context, MCR is used in the multisets structure (see Chapters 8 and 11 of this book) when combining the data from different blocks. This is surely a sound and preferred approach; however, it may become nonoptimal when, as in this case, the samples mode is shared by data blocks constituted of very different kind (and size) of variables, e.g., metal contents and spectral fingerprint, but there is not a varying condition for each sample such as time of measurement, pH, or a second spectral dimension. In other words, when data augmentation limits to variables concatenation and there is not real replicate information for the same sample to assist the resolution of the underlying components.
284
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
4.2.1.2 Discrimination of Varieties and Modeling Authenticity
As far as Lambrusco of Modena wines are concerned, it is of interest both to assess the distinctive traits of the three main PDO denominations Grasparossa, Salamino, and Sorbara, which are located in different areas of the Modena district and are differently valued, and to dispose of a geographical traceability model to assess if a wine product belongs to one of the Lambrusco of Modena denomination, including the Lambrusco of Modena, which can be also a mixture of the previous three, or not. The first is a discrimination task because all the three categories can be characterized and correspond to specific grapes, thus PLS-DA has been used [69]; the second is the typical issue of distinguishing one category from the rest of the world; hence, the class modeling approach is suitable and SIMCA has been used [38]. The DF approaches applied to reach these objectives are illustrated in Fig. 10.2 (top right and bottom panels, respectively). Discrimination of varieties has been accomplished for the subset of authentic wine samples for which fluorescence (EEM) and 1H-NMR spectra, together with HPLC-DAD determination of the phenolic compounds, were recorded. The specificity in this case is that two of the data blocks are three-way arrays. In the original paper [69], mid-level DF has been applied and, considering the nature of each data block, as shown in Fig. 10.2, different compression methods: (1) EEM landscape, for which trilinearity holds, has been deconvolved in four “pure” profiles by parallel factor analysis (PARAFAC); HPLC-DAD landscape, on which shift in retention time is present, has been resolved in “pure” components by MCR (as features, the integrated area of resolved components, in total 39, were used) and (2) NMR spectra have been compressed by PCA (four components). At the features fusion step both block scaling and autoscaling were evaluated and the latter gave the best performance when PLS-DA classification was applied. The classification results were improved in comparison with the one achieved by each single technique; overall (calibration and test sets) the DF model misclassified only one sample. Part of this data set, i.e., NMR and EEM data blocks, was further explored in Chapter 6 of this book by using multiblock-based DF, in particular SO-N-PLS-DA, SO-PLS-DA, and MB-PLS-DA, after unfolding were compared. Among the tested methods, SO-N-PLS-DA gave the lowest number of misclassified samples (19% error rate). The lower performance with respect to mid-level DF may be attributed to the fact that information from the phenolic fraction has not been included; in Ref. [69] it was reported that at least the compounds resolved in the retention time windows of syringic and caffeic acids are needed for good discrimination. A general traceability model was obtained by fusing the 1H-NMR spectra [71] and metal contents and isotopic ratios [72,73] of a wider set of samples, including authentic samples of Lambrusco Salamino, Sorbara, and Grasparossa, in total 75, which were split in a calibration (53) and test
4. APPLICATIONS
285
set (22). As further test sets other 22 samples of Lambrusco of Modena were predicted, together with 19 Lambrusco samples not belonging to the Modena PDO, i.e., from the close by Reggio Emilia, Parma, and Mantova areas, of which 9 were used to assess efficiency in the calibration stage and 10 together with 4 samples of Lambrusco from Spain as validation set to test specificity. The adopted DF strategy (Fig. 10.2 bottom) consists of two steps: (1) concatenating at low level the metal contents (10 variables) and isotopic ratios (9 variables) preprocessed by autoscaling (because the number of variables is balanced); and (2) mid-level fusion of the result of the previous step with four principal components obtained by PCA of the NMR data (pareto scaling); at this fusion stage, block scaling to equal block variance has been applied. The SIMCA model obtained on the mid-level DF data showed a sensitivity of 86% and specificity of 100% for the test set. The best performing model of the two single data blocks, i.e., SIMCA on the NMR data, achieved a slightly higher sensitivity for the test 91% but much lower specificity especially toward Lambrusco samples from close by provinces [38]. This case study shows the benefit of DF approaches and the necessary complementarity of indirect and direct indicators in geographical traceability studies.
4.3 Vinegar Among vinegars, the main commercial interest is in differentiating high-added-value PDO and PGI vinegars from succedaneum products, as well as to distinguish wine and grape mustebased vinegars from others, such as apple vinegars. Among the PDO vinegars, the Aceto Balsamico Tradizionale di Modena (ABTM) is by far the most well-known and valued one, because of its traditional production protocol and very long dynamic aging in wooden barrels. Recently, Spanish aged PDO wine vinegars, such as Vinagre de Jerez, have also gained much attention. In general, all these vinegars have been extensively characterized with respect to compositional aspect and aging by several analytical techniques but seldom integrated in a DF perspective [74e76]. In Ref. [76], a low-level DF approach integrating attenuated total reflectance (ATR)-MIR and FT-NIR spectra has been applied to distinguish the main commercial categories of ABTM, Affinato (at least 12 years of aging and a sensory score by a recognized expert panel above 229) and Extravecchio (at least 25 years of aging and a sensory score above 256) and Aceto Balsamico di Modena (PGI), “fascia alta” and “fascia bassa.” Interestingly, the DF model was able to differentiate along the first two principal components (exploratory analysis by PCA on spectra baseline corrected, concatenated with block scaling to unit block variance preprocessing) the various categories. The first principal component
286
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
FIGURE 10.3
Exploratory analysis of ABM and ABTM categories by low-level fusion of NIR and MIR spectral profiles. Top: scores plot of PC1 versus PC2. Empty circles identify samples taken from each cask of an ABTM battery, label report names of casks. Bottom loadings scatter plot of PC1 versus PC2.
(Fig. 10.3, top) is able to distinguish mainly ABTM samples (positive scores values) from ABM samples (negative scores values), whereas the second one consents, with inverse trends, to separate the high-quality products belonging to the two family of vinegars: in the case of ABTM the most valuable products are present at positive values for the second PC, whereas these are at negative for ABM samples. Vinegar samples coming from different barrels (from Batteria 6, i.e., cask six, which holds the youngest product to Batteria 1, i.e., cask 1, which holds the oldest. The Extravecchio product is generally spilled by cask 1 and 2) once projected on PCA space also nicely follow a time trend from top right to bottom left. By
4. APPLICATIONS
287
inspecting the loadings plot it was possible to interpret the role of the different spectral regions [74]. The fused data set was used to create a SIMCA one-class model [77] for the Extravecchio category; good sensitivity and specificity were achieved, improved with respect to the use of the single data sets. Very recently [75], some of us conducted a study to comparatively discuss different DF strategies, in terms of capability to improve discrimination of three Spanish PDO’s vinegars, namely, “Vinagre de Jerez,” “Vinagre de Condado de Huelva,” and “Vinagre de Montilla-Moriles,” as well as to highlight the role of each spectroscopic technique, NMR, EEM, MIR, and NIR, used to characterize them. These spectroscopic techniques previously individually applied on the Spanish PDO wine vinegars under study have demonstrated to provide a good classification of sweet or aging categories within a single PDO; however, none of them allowed the classification of the PDOs in a global way, that is to say, a PDO classification independently of the category of vinegar to which a sample corresponds. The DF approaches applied are illustrated in Fig. 10.4: one strategy was based on the recently developed P-ComDim multiblock method (see Chapter 7 of this book for details on the method), applied either directly on the single spectral data block (Fig. 10.4, left) or on the features derived from each of them with an appropriate decomposition/resolution method (Fig. 10.4, middle); the second strategy consisted of mid-level DF, with the MIR/NIR spectral data first concatenated at low-level DF (Fig. 10.4, right).
FIGURE 10.4 Data fusion approaches applied in the discrimination of Spanish PDO wine vinegars.
288
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
The separate compression of MIR and NIR spectra by PCA to extract features to be used in the mid-level DF approach is an alternative, which was also considered in the study and gave very similar results. One reason to prefer concatenation of MIR and NIR spectra is to benefit from correlation and keep the redundant information in the same PCs because NIR consists of overtones and combinations of modes also present in MIR. The NMR spectra were resolved in 62 components by using distinct MCR models on single spectral intervals; the areas of the resolved components in each interval were then overall used as NMR features. As preprocessing techniques in the mid-level fusion stages, both autoscaling and blockscaling were considered, obtaining substantially similar results. The classification of the three PDO categories was achieved by PLS-DA. The main remarks that can be drawn from the study are as follows: 1. The application of DF methods improved the characterization and authentication of PDO wine vinegars, with respect to models based on single methods; 2. The single analytical method providing better classification results is 1 H-NMR, also because through MCR methodology it was possible to retrieve only structured information out of the spectra; 3. The direct application of the P-ComDim method to the spectral data was very useful for describing, in a simple and synthetic manner, the overall spectral information and to reveal the complementarity and differences of each analytical technique, assessing on one hand the profile of the common components and the distinctive contribution from the analysis of salience; 4. However, the classification results showed that mid-level DF was the best performing option in comparison with P-ComDim applied on the raw data. In spite of this, P-ComDim achieved very good results when applied on the extracted features.
4.4 Beer Among alcoholic beverages, beer is one of the most consumed all over the world; for this reason, it is widely investigated and several approaches aimed at attesting its quality have been developed. Despite this, in the literature, only few DF approaches investigating this beverage have been presented. In this paragraph, three research studies conducted on beers in a multiblock framework will be discussed: although the analyzed matrix is always the same (beer), the analytical techniques and the chemometric tools involved are different, as their aims are.
4. APPLICATIONS
289
4.4.1 Classification of Beers According to Their Style The first study is the one published by Gutierrez and coauthors [78]. In this work, 25 beers (both craft or commercial) were investigated by electronic tongue (constituted of potentiometric and voltammetric sensors), explored by principal component analysis (PCA), and then classified by stepwise linear discriminant analysis (LDA) according to their style (i.e., lager, IPA, or porter); details over the practical collection of signals are reported in [78]. Beer bottles were opened immediately before the analysis, but samples were diluted with distilled water (30:70, for beer and water) to minimize the matrix effect and to limit the presence of carbon dioxide on the sensors surfaces. Measurements were collected as described in Ref. [78], and then signals were normalized (dividing each variable, column by column, by the square of the maximum value) and organized in a unique matrix of dimensions 25 57 (constituted by both potentiometric and voltammetric features). The above-mentioned data matrix is ill-conditioned and, therefore, it would not be possible to apply LDA on it. To overcome this problem, Gutierrez et al. classified by means of the stepwise LDA approach [79], introducing a features reduction step before the application of the Fisher’s classifier. The variable selection procedures allowed reducing the number of experimental variables down to 15. Finally, data were mean-centered, and then the classification model was calculated in leave-one-out cross-validation, extracting three latent variables. Among the 25 samples, only one, a stout beer, was misclassified (the model assigned it to the IPA class), whereas all the others were correctly assigned to their own category. The authors compared the results obtained by the DF approach with those provided by models built on individual data matrices and observed a remarkable achievement: the multiblock approach definitely represented an improvement in terms of predictive ability. 4.4.2 Classification of Beers According to the Factory The second example is the work published by Vera et al. [80], in which 67 beers presenting the same brand but brewed in four diverse branches were analyzed to see whether it would be possible to discern them according to the factory of origin. Beers were analyzed by three different analytical techniques, MS e-nose, mid-IR optical-tongue, and UVevisible, and then classified by a low- and mid-level DF strategy; for practical details about the instrumental analysis, the reader is addressed to the original paper [80]. The dimensions of the different data blocks are 48 150, 48 210, and 48 341, for MS e-nose, a mid-IR optical-tongue, and a UVevisible, respectively.
290
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
Before the application of any classifier, samples were divided into a training and a test set of 48 and 19 objects, respectively, to externally validate the model. Diverse pretreatments were tested on data from the different analytical platforms; in particular, IR and UV-Vis spectra were pretreated by standard normal variate [32] and by the (Savitzky-Golay) first derivative [81], whereas mass spectroscopic signals were normalized by row profiles. First, a low-level DF approach was tested; consequently, the three data blocks were concatenated in the row direction, obtaining a training matrix of dimensions 48 652, and then autoscaled, to ensure scale invariance. The resulting data matrix is noninvertible, and LDA would not be directly applicable. Consequently, to overcome this hitch, a PCA model was calculated, and then Bayesian-LDA was applied on the first 7 PCs: all the test samples were correctly classified except one. Anyhow, this approach does not provide any information about which variables are the most relevant from the classification point of view. Consequently, to investigate this aspect, also stepwise-LDA [79] was applied. From this, six variables were found to contribute the most to the classification; they are one fragment having m/z 55 from the e-nose data, four IR absorptions (1068, 1134, 1176, and 1538 cm1) from the optical tongue, and one spectral feature from the e-eye (391 nm); the reader is addressed to the original paper for the association of instrumental variables to chemical compounds. As anticipated, despite the optimal results obtained by the described procedure, also a mid-level DF approach was tested. This strategy requires that the main features are extracted from the individual data matrices. For this reason, each data block was pretreated, then three separate PCA models (one per each data matrix) were calculated on the (preprocessed) training sets, and the principal components presenting the maximal discrimination capability (estimated on Fisher weights; seven in total) were concatenated, obtaining the super-scores matrix (TC). Eventually, the final classification model was built by the application of Fisher-LDA on TC. The scores matrix for the validation samples has been estimated by projecting test set measures onto the loading matrix extracted by PCA on the training observations. This approach led to good results (only two misclassified samples over the 19 test objects), slightly worse than those achieved by low-level DF. Eventually, Vera et al. compared the classification rates obtained applying PCA-LDA on the individual data blocks with those achieved by the multiblock strategies, concluding that results obtained by DF techniques were equal or better than those obtained by the separate analysis of data from the different platforms.
4. APPLICATIONS
291
4.4.3 Authentication of a Signature Beer From a Craft Brewery In the last years, the “beer renaissance” put a spotlight on artisanal beers, and numerous craft breweries, particularly devoted to the creation of excellent beverages, free expression of specific territories, arose also in countries where the beer culture was not that spread out. As a consequence of this awareness, and with the increase of apperceptive beer drinkers, the authentication of high-added-valued craft beers has become relevant, to ensure the quality of such beverages and to protect both producers and consumers from possible frauds. For this reason, Biancolillo et al. investigated the possibility of characterizing a signature craft beer produced in Central Italy by Birra del Borgo Brewery (Borgorose, in the Lazio area) [82]. This product, called “ReAle” is the flagship beer of the brewery and presents specific characteristics linked to the raw materials used for its brew, strictly linked to the area where the production takes place. To authenticate the ReAle beer, 60 beers were analyzed by a multiplatform approach. Of these, 19 were ReAle beers and 41 were brewed in Birra del Borgo or in other breweries (both in Italy or in other countries). Samples were analyzed by different analytical techniques: NIR and MIR, UV and Vis spectroscopies, and thermogravimetry; then data blocks were classified individually, by PLS-DA [83,84] or by SIMCA [85,86] and by a low- and mid-level DF approach. The classification problem faced is a two-classes one; in fact, one category is dedicated to the beer of interest (called “class Reale”), whereas the other one contains all the other beers that are not Reale (and it is simply denominated “class Others”). Before the creation of any model, to achieve external validation, signals were divided into a training set of 40 samples (19 appertaining to class Reale and 41 to class Others) and a test set containing 20 objects (6 belonging to class Reale and 14 to class Others). Different preprocessing approaches were tested on the different platform outcomes, and the optimal ones have been defined into a crossvalidation procedure (on training samples). Both PLS-DA and SIMCA analysis on the individual blocks provided satisfactory results, misclassifying few test samples. To apply the low-level DF strategy, preprocessed data blocks were concatenated into a unique training and test matrix of dimensions 40 6185 and 20 6185, respectively. This approach provided extremely satisfying results: all the samples belonging to class Reale were correctly classified, whereas the classification rate obtained for class Others was 92.8%. Despite the excellent outcome, the mid-level DF approach was also tested. In this case, the features extracted from the distinct PLS-DA models built on the individual set of measures were concatenated, obtaining a feature matrix of 20 variables. Finally, PLS-DA was built on
292
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
the mean-centered super-scores matrix, and all the test samples were correctly classified, achieving 100% of correct classification on both class Reale and class Others. In conclusion, also in this case the multiblock strategy provided better results, in terms of predictions, than the distinct analysis of the different blocks. The same data set presented and discussed in Ref. [82] has been used also by Brownfield et al. who applied a high-level DF strategy to solve the above-mentioned problem (class Reale vs. class Others) [16]. The strategy discussed in Ref. [16], consensus classification, is a multiblock method based on the idea that distinct classification models can be calculated on individual data blocks by diverse classifiers, and then, sample per sample, predictions are merged and each object is assigned to the class it has been assigned more often to. Nevertheless, Brownfield and collaborators wanted to test the performances of this high-level DF approach and also wanted to investigate the possibility of performing consensus classification without optimizing any model parameter. The authors selected 17 different classifiers, of which 6 (Mahalanobis distance, Q-residual, sinq, divergence criterion, PLS2-DA, and kNN) require an optimization step. Instead of defining an optimal solution, the authors calculated different models applying diverse parameters and then joined all the information into a unique solution. Eventually, they summed up results from 564 models and achieved a correct classification rate higher than 97%.
4.5 Dairy Products Cow milk is one of the most consumed nonalcoholic beverages in the world. Given its nutritional characteristics, it is particularly suitable for children and for the elderly; nevertheless, it can generally be consumed by anyone who does not present any disease that would prevent its consumption. Additionally, milk is not only used as it is but also widely used as a raw material for the production of dairies. Consequently, it is not surprising that several multiblock approaches have been applied on this matrix, to develop analytical methodologies that allow the authentication and the quality check of this commodity. The analytical platforms used to investigate this beverage are various, as they are the chemometric approaches applied for their modeling. For instance, Di Natale et al. used metalloporphyrins sensors (electronic tongue for liquid samples and electronic nose on the head space) coupled with a DF PCA [87e89] to investigate whether it is possible to distinguish UHT from pasteurized milk, fresh or spoiled [89a]. PCA
4. APPLICATIONS
293
analysis was run on individual data blocks, following a low-level DF strategy or combining principal components from the individual models. Eventually, the best solution, showing a clear distinction among samples according to the process they underwent and to their freshness, was obtained combining the first PC from the electronic nose and the second one from the PCA calculated on the e-tongue signals. Another application of sensors on milk is reported in the work of Henningsson and collaborators [90], whereby a conductivity and a density meter, together with an optical instrument, were combined to quantify fat and milk content in water/milk mixtures by weighted nonlinear least squares [91,92] aiming at online-tracking diary processes. The methodology they developed performed well and resulted as suitable for supervising the process. Obviously, discussing the application of DF on dairy products, the interest is not only on milk but also on its derivatives; in this framework (i.e., multiblock analysis) one of the most investigated is cheese. In this regard, different aspects can be inspected, for instance, in Ref. [93], Loudiyi and collaborators investigated, by multiblock explorative analysis, how ripening time and the salting procedure would affect the final product. In their study, they considered 20 samples of cantal-type cheeses, having 5 different compositions and/or concentrations of salts and ripened for 5 or 15 days. Samples were analyzed by fluorescence and MIR-ATR spectroscopy, Vis-NIR multispectral imaging, rheology, and texture analysis, and, additionally, some physical-chemical characteristics (e.g., fats, lactic acid, total nitrogen, and metals contents, pH) were estimated by traditional laboratory methods (e.g., Kjeldahl, spectrophotometry, potentiometric titration) [94]. First, multiplatform tabs were individually analyzed by independent components analysis (ICA) [95] and a reduced subset of 31 source signals was obtained. Finally, ComDim [96] analysis was pursued to visualize similarities/differences among samples. Loudiyi and coauthors concluded that diverse ripening time, salt concentration, and heating affect physical characteristics of cheeses; additionally, they observed that halving the NaCl concentration, or replacing 25% of its amount by KCl, does not sensibly affect the production. Nevertheless, the authors made clear that these considerations are achieved by means of physicalchemical measurements and they would need to be confirmed by sensory analysis performed by trained panelists. In this framework (i.e., sensory analysis of cheese), an interesting work is the one proposed by Feron et al., who set up an in vivo study to investigate how the “action of eating” cheese influences the aromas perceived [97]. In particular, they asked eight subjects to consume different cheeses (flavored by two aroma compounds: ethyl propanoate and nonan-2-one) following the procedure described in Ref. [97]. Feron and
294
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
collaborators evaluated some specific characteristics of each participant (e.g., chewing activity, oral capacity); moreover, they analyzed the bolus (saliva content and rheological properties), the saliva, and the leftovers layer in the mucosa, and they sampled and analyzed the expelled air from the nose of each subject (at different points of the experiment) by atmospheric pressure chemical ionization-mass spectrometry. The complex data matrix obtained during the study was then analyzed by MB-PLS to highlight which variables influence the most the aroma release. Cheeses are widely investigated matrices also because some of them are high-added-value products, labelled by quality marks as PDO or PGI. For this reason, it is relevant to authenticate and trace them, to ensure label compliance and to avoid frauds. For instance, Cozzi and coauthors [98] investigated different samples of “Asiago d’Allevo” cheese (produced in Italy) to test whether it would be possible to distinguish loaf produced in farms from those prepared in factories and, additionally, if they would be divided according to the altitude of the production site. Samples were analyzed by NIR spectroscopy and colorimetry and by the estimation of some chemical parameters (e.g., total protein, NaCl and fatty acids content, moisture). Collected signals were analyzed by PLS-DA [83,84]; then, the first three scores were row augmented with the additional information in a final multiblock data set. Discriminant analysis was carried out [98] achieving good results discriminating Asiago samples produced in farms from those made in factories, but it was not possible to obtain satisfactory predictions for what concerns the altitude (lowland or mountain).
4.6 Tea Among nonalcoholic beverages, tea is probably one of the most consumed beverages all over the world. It is an obvious consequence that in the literature there are many papers discussing how to assess its quality and/or characteristics. Owing to the profuse interest, several multiblock strategies have been proposed, in particular to distinguish different types of teas or to investigate its quality. From the analytical point of view, sensors (e-nose and e-tongue) are the techniques more often used for the analysis of this commodity, whereas for what concerns the application of classifiers, or DF approaches, several methodologies have been proposed, achieving (in general) satisfactory results. As mentioned, e-nose and e-tongue have been applied in several studies; for instance, in Ref. [99], black tea samples were analyzed by means of these techniques and then classified according to their quality. In addition to performing individual block analysis, the multiplatform data set was investigated by a low-level approach: data were merged in a unique matrix (after feature reduction achieved by wavelet transform
4. APPLICATIONS
295
[100]) and then classified by back-propagation multilayer perceptron [101,102], radial basis function network [103], and probabilistic neural network [104]. Banerjee (Roy) and collaborators concluded that the multiblock approach provided very satisfying results, better (from the prediction point of view) than individual block analysis. The same author published another paper [105] solving a similar classification problem, but the method used to discriminate among the different teas is a Bayesian one proposed by Fahim in [106]. Another work, similar from the analytical and the classification point of view, has been proposed by Wang and collaborators [107]. E-nose analysis has been applied also by Xu et al. [108], who used this analytical technique and computer vision system to check the quality of teas. Classification was achieved by mid-level DF applying K-nearest neighbors (KNN), support vector machine (SVM), or multinomial logistic regression (MLR) on PCs extracted by the multiplatform data set, or by a high DF approach calculated on the previously mentioned KNN and SVM models. The classification rates achieved were quite high (>90%) on an external validation set. Also, Chen and coauthors used sensors (gustatory and olfactory) to analyze different oolong teas aiming at discriminating among four diverse varieties [109]; classification was achieved by DF: 27 olfaction variables and 20 from the gustative devices were row augmented and then PCA and LDA were applied to classify. Results were promising; (leave-one-out) cross-validated classification rates were 100% for all the categories. A similar classification problem was faced by Lin et al. [110]; in fact, they classified oolong teas according to their varieties by a DF approach. Nevertheless, the analytical platforms they used were headspace solid phase microextraction and gas chromatography-mass spectroscopy (GC-MS), which were merged and then classified by PCA and LDA. Also, in this case, the method performed well from the prediction point of view. Hyperspectral imaging has also been quite widely used to investigate teas; for instance, Li and coauthors used this technique, together with olfactory visualization systems, to classify tea samples according to six quality categories evaluated by panelists. Data analysis was performed by PCA-LDA, PCA-KNN, and SVM; the latter DF approach provided the best results (92% of correct classification) on an external test set [111]. Hyperspectral imaging has also been used in the work proposed by Ning et al., where visible and NIR hyperspectral imaging were used to analyze 206 samples of oolong teas. Once data were collected, features (spectral and textural) were reduced by PCA, PCs were merged, and samples were classified according to their fermentation degree by LDA, library support vector machine (Lib-SVM), and extreme learning
296
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
machine. The best results in terms of predictions (on an external test set) were provided by the approach involving Lib-SVM [112]. On the other hand, although the analyses of teas present in the literature are mainly pursued by sensor analysis or hyperspectral imaging, spectroscopy has also been used on this regard. For instance, in the work published by Dankowska and Kowalewski [113], synchronous fluorescence, NIR, and UV-Vis were used to analyze 36 tea samples of six diverse types having five different geographical origins. The main features were extracted from each data block by PCA and row-wise concatenated, and then teas were classified according to their type by LDA, quadratic discriminant analysis, regularized discriminant analysis [114], and SVM. Also, this strategy provided very low classification errors, in particular the approach based on quadratic discriminant analysis (QDA).
4.7 Other Food Products The applications described in the previous paragraphs represent a simple selection of the works presented in the literature on some of the most widely investigated food matrices. Despite this, the applications of DF strategies in food analysis are numerous and involve several other foodstuffs. 4.7.1 Beverages For what concern beverages, fruit juices have been widely inspected by multiblock approaches; the main platforms used for their investigation are the e-nose and the e-tongue, combined by low-level DF approaches, as, for instance, it is done to distinguish juices from different fruits in Refs. [115,116] or to inspect adulterated cherry tomato juices in Ref. [117]. On the other hand, different platforms have been used in Ref. [118], where GC-MS and liquid chromatography-mass spectrometry data collected on orange juices produced in different factories are row-wise concatenated and then samples are classified according to their brands by PLS-DA. 4.7.2 Fruits Among food, fruit is one of the most common investigated subjects in a DF context. Often, the investigation aims at attesting its quality by quantifying specific characteristics. For instance, several multiblock approaches have been developed to test the firmness and the soluble solids content (SSC) of apples, as it is done in Ref. [119] by Mendoza et al., where four sensors are used to analyze diverse cultivars of apples and then PLS is applied in a DF approach to quantify firmness. Additionally, in the same work, they test the influence provided by the inclusion/ exclusion of each data block into the multiblock models. The same
4. APPLICATIONS
297
approach, row augmentation of sensor data and PLS to predict the compactness of apples was used also by Zude [120]; nevertheless, in this case, the analytical techniques used were an acoustic sensor and a Vis-NIR device. Another relevant physical parameter for apple quality assessment is the sugar content. A multiblock methodology aiming at its quantification has been proposed by Steinmetz and collaborators; samples have been analyzed by sensors (a vision system and an NIR sensor), and then PLS was applied on merged features to estimate the sugar content [121]. Sensors have been used also to trace apples or to classify them according to their varieties, for example, as it is done in Ref. [122] by Wu et al. or in Ref. [123] by Rudnitskaya and collaborators. In the former work, Wu and coauthors investigated the juices obtained from 126 apples harvested in different Chinese areas by e-nose and e-tongue; finally, they applied LDA, SVM, and PLS-DA on the merged data to predict the geographical and the botanical origin. Independently of the classification problem, the proposed approaches resulted extremely suitable for their aim. Similarly, Rudnitskaya et al. measured juices extracted from 100 apples by e-tongue and by FTIR-ATR; both data blocks were analyzed by PCA, and then discriminant analysis was applied on merged PCs to discern different varieties [123]. Sensors have been applied to estimate quality characteristics (e.g., firmness and SSC) of not only apples but also other fruits. For instance, these attributes have been investigated on peaches by Di Natale et al. [124] by e-nose coupled with visible spectroscopy. In this case, the fusion between the two data blocks was achieved by outer product, leading to a three-way data structure; as a consequence, before the application of PLS, the authors needed to unfold the cube into a matrix. Similar approaches (estimation of firmness by nondestructive analytical platforms) have been proposed also by Vursavus [125] and RuizAltisent [126], who applied sensors to predict quality parameters on peaches, and by Ozer and collaborators on melons [127]. Obviously, vegetables have also been investigated by multiblock methods, but their presence in this context is slightly reduced compared with fruit; anyway, similar approaches are applied. Diverse qualities of this kind of foodstuff are often investigated by PLS-DA, as it is done, for instance, on tomatoes following a low-level approach to combine proton NMR and MIR spectroscopy in Ref. [128], or applying a mid-level DF on NMR, IR, and IRMS data (collected on tomatoes), using common components and specific weights analysis (CCSWA) [96,129,130] to extract the main features. PLS-DA has also been used (in comparison with orthogonal projection analysis, Mahalanobis distance, and KNN) to discern Fava Santorinis from similar peas applying a low-level DF strategy on rare and trace metal concentrations [131]. On the other hand, a mid-level data
298
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
approach combined with PLS-DA as classifier has been used to trace the Leccinum rugosiceps mushroom in Ref. [132] on the basis of spectral measures (UV and IR spectroscopies). Different chemometric tools have been applied on vegetables, for example, a low-level approach based on instrumental variable selection and LDA, SVM and classification trees has been applied on hyperspectral, color, 3D, and X-ray imaging data collected on onions, to recognize infected bulbs [133]; a further classifier in a low-level data set, ComDim, has been applied on tomatoes in Ref. [128]. The latter applications reported on vegetables are DF strategies combined with classifiers; nevertheless, regression approaches are also applied in the same context, for instance, as it is done in Ref. [134] where PLS and support vector regression are used to predict the hardness of tomatoes combining computer vision and electronic nose. 4.7.3 Seeds and Their Derivatives Coffee is, together with tea, another nonalcoholic beverage traditionally consumed in several countries around the world. To ensure its quality, among the other things, it is important to check the quality of the raw materials it is brewed from. In the literature, it is possible to find several papers discussing methodologies aimed at investigating green/ roasted or grounded beans; depending on the analytical platforms applied for the analysis, the investigation will focus on seeds as they are, on extracts, or on brewed coffee. Obviously, DF approaches have been used also in this context, mainly to characterize, authenticate, and trace coffee beans or to estimate their chemical composition. For instance, an analytical multiblock methodology finalized at quantifying specific compounds in ground coffee extracts was proposed by Assis et al. [135]. In this work, green coffee Arabica and Robusta beans were roasted, ground, mixed in different blends, and finally extracts were analyzed by ATR-FTIR and paper spraymass spectrometry. Diverse chemical compounds were then estimated by PLS on the individual data block or in low- and mid-level DF approaches (on the extracted features) after a variable selection step operated by genetic algorithms [136] and ordered predictors selection (OPS) [137]. Eventually, the low-level DF approach based on original data reduced by the OPS approach provided the best results. Concerning characterization of this product, some published papers focus on distinguishing between the two main varieties: Arabica and Robusta. To achieve this goal, different analytical platforms have been used; for example, Calvini and collaborators [138] used hyperspectral sensors combined into low- and mid-level DF approaches. In particular, average spectra, single space hyperspectrograms, and common space hyperspectrograms were individually analyzed by PCA and PLS-DA, and
4. APPLICATIONS
299
then, original variables or extracted features were merged and classified by PLS-DA, obtaining a low- and a mid-level model, respectively. The same classification problem (distinction between Arabica and Robusta) has been faced also by Downey et al. [139] applying a low-level DF approach. In their work, samples of lyophilized ground coffee were analyzed by NIR and MIR spectroscopy, normalized, row-wise concatenated, and then classified by Factorial Discriminant Analysis (FDA) [140] and PLS. Both the methodologies proposed in Refs. [138,139] provided low classification error. An even more specific study based on multiplatform analysis on coffee varieties has been proposed by Dong and coauthors [141]; in fact, they developed a sensor-based (e-nose and e-tongue) approach aimed at distinguishing seven different varieties of Robusta samples. To achieve their goal, Dong et al. applied KNN, PLS-DA, and back propagation artificial neural network (BP-ANN) [142] on fused sensor data obtaining quite high classification rates by means of all the classifiers. Because the chemical composition (and therefore the quality of brewed coffee) depends not only on the cultivar but also on the origin of beans, even this aspect has been investigated. In particular, Yener and collaborators [143] applied a multiblock DF approach based on head space proton transfer reaction switchable reagent ion system-mass spectrometry (PTR/SRI-MS) aimed at classifying coffee brewed using beans coming from six different geographical origins. The fusion has been pursued as suggested by Hall and Llinas [144], and then classification was achieved by penalized discriminant analysis [145], random forest [146], PLS-DA, and SVM. Finally, as with all the high-added-value products, it is important to detect possible frauds on coffee samples. One methodology proposed to reveal adulteration in coffee blends is the one proposed by Reis and coauthors [147]. In this work, adulterated samples of coffee were analyzed by ATR-FTIR and FTIR in diffuse reflectance and signals were then classified by PLS-DA following a low- and a mid-level DF strategy; the lowest prediction errors were achieved by the latter approach. 4.7.4 Meat and Fish Meat and fish are additional classes of foodstuff consumed all over the world; they are delicate products, which could easily deteriorate, and consequently it is definitely relevant to develop methodologies that would allow testing their freshness and quality. For this reason, one of the main investigations conducted on this foodstuff is the total viable count (TVC) of bacteria. Owing to the market value of this commodity, the ideal solution would be to pursue this evaluation applying a nondestructive approach; for this reason, Li and collaborators developed a multiblock methodology exploiting hyperspectral imaging and colorimetric sensors
300
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
for the estimation of TVC [148]. To handle the multiplatform data set, the authors extracted features by PCA and then they tested PLS, stepwise MLR, BP-ANN, and SVM; the latter regressor provided the most satisfying results. In this regard, one of the main analyses pursued on meat is the quantification of K or of total volatile basic nitrogen (TVB-N); in the literature there are diverse applications in which these parameters are evaluated by hyperspectral imaging; for instance, in the work from Cheng and collaborators [149], textural and spectral features collected on pork meat are selected by successive projections algorithm (SPA) [150] and merged and then predictions are estimated by PLS; a similar approach has been proposed for chicken by Suxia et al. [151], where the fit was achieved by K-means radial basis function [152] neural networks. Hyperspectral imaging has also been used to investigate the TVB-N in pork [153]: Guo and collaborators selected textural and spectral features by PLS, extracted the main features by 2DPCA [154], and finally applied SVM to predict TVB-N. Other approaches, more oriented on a general overview on the quality of meat based on hyperspectral imaging (Vis/NIR spectroscopy and microscopy) have been published by Peiyuan and collaborators, who assessed sausage quality by a low-level DF approach based on SVM [155]; a similar work [156] on bacon (presented by the same author) has been conducted using the same analytical technique, but samples have been classified according to their quality by a low-level DF approach exploiting self-organizing map neural network [157] as classifier. As the above-mentioned TVB-N is an important quality parameter for meat, it has also been evaluated by other techniques; for instance, Khulal et al. estimated it analyzing chicken meat by diverse optical and odor sensors and then estimated the TVB-N by both low- and mid-level DF strategies. In particular, for the latter one, which provided the best results, sensory variables were row-wise concatenated and then back propagation-ANN was applied to predict TVB-N [158]. Another DF methodology was developed with the same aim, but based on NIR spectroscopy, computer vision system, and e-nose, as has been proposed by Huang et al. [159], who predicted TVB-N content applying BP-ANN on the PCs extracted from the data blocks. Also, meat is subject to frauds; in particular, water or moisturizing substances could be added to it to enhance its characteristics and make it appear more appealing to the eyes of the consumer. For this reason, analytical techniques finalized to detection of adulterated meats have been developed. In this regard, a DF methodology for identifying samples injected with adulterants, based on ATR-FTIR and chemical parameters, has been proposed by Nunes et al. [160]. Data blocks were analyzed individually, and then VIP indices were used to select a subset of original variables that were row-wise concatenated and then classified by PLS-DA, to discern adulterated samples from controls.
REFERENCES
301
Another quite common fraud in this context is given by the adulteration of minced meat by addition of another type of it presenting a lower market value. For instance, Alamprese and collaborators investigated by NIR, MIR, and UV/Vis spectroscopy the possibility of detecting bovine minced meat adulterated by that of turkey [161]. By the application of a DF approach they managed to distinguish safe from adulterated samples (by PCA-LDA) and also to quantify the amount of adulterant (by PLS). For what concerns fish, the main multiblock methodologies developed to assess its quality aim at investigating whether it has been frosted or not. To solve this problem, Ottavian et al. analyzed 222 samples of West African goatfish by Vis/NIR spectroscopy, RGB imaging, and using a texture analyzer. Data blocks were analyzed individually or by a low-level DF approach using PLS-DA as classifier. The multiblock approach provided the best results (in terms of correct classification) on the external validation set [162]. Another application, aimed at investigating the quality of fish meat samples, has been proposed by Korel et al. [163] who analyzed raw tilapia samples by machine vision and e-nose combined into a DF approach based on discriminant function analysis (FDA) [164] to classify samples according to their quality.
5. CONCLUSIONS With the growing availability of high-throughput methodologies for food characterization and analysis, more and more data are being collected on food products that can be used for the authentication of their quality. In this context, the availability of different multiblock strategies, each with its own peculiarities and providing specific details on the investigated samples, allows integration of the information from the different sources into a richer model with great flexibility. Indeed, all the examples reported, providing a broad overview on quality-related issues, show the potential of the use of DF approaches for food quality assessment.
References [1] M. Lees, Food Authenticity and Traceability, Woodhead Publishing, Sawston, UK, 2003. [2] R. Abbot, Food and nutrition information: a study of sources, uses, and understanding, Br. Food J. 99 (1997) 43e49. [3] I.N. Edith, E.M. Ochubiojo, Food quality control: history, present and future, in: B. Valdez (Ed.), Scientific, Health and Social Aspects of the Food Industry, IntechOpen Ltd., London, UK, 2012, pp. 421e438.
302
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
[4] D.-W. Sun, Modern Techniques for Food Authentication, Elsevier, Amsterdam, The Netherlands, 2008. [5] S. Esslinger, J. Riedl, C. Fauhl-Hassek, Potential and limitations of non-targeted fingerprinting for authentication of food in official control, Food Res. Int. 60 (2014) 189e204. [6] E. Borra`s, J. Ferre´, R. Boque´, M. Mestres, L. Acen˜a, O. Busto, Data fusion methodologies for food and beverage authentication and quality assessment e a review, Anal. Chim. Acta 891 (2015) 1e14. ˚ . Rinnan, M. Arendt Rasmussen, T. Skov, Recent [7] M. Bevilacqua, R. Bro, F. Marini, A chemometrics advances for foodomics, Trends Anal. Chem. 96 (2017) 42e51. [8] I.E. Frank, J. Feikema, N. Constantine, B.R. Kowalski, J. Chem. Inf. Comput. Sci. 24 (1984) 20e24. [9] I.E. Frank, B.R. Kowalski, Prediction of wine quality and geographic origin from chemical measurements by partial least-squares regression modeling, Anal. Chim. Acta 162 (1984) 24e251. [10] L.E. Wangen, B.R. Kowalski, A multiblock partial least squares algorithm for investigating complex chemical systems, J. Chemometr. 3 (1988) 3e20. [11] J.A. Westerhuis, A.K. Smilde, Deflation in multiblock PLS, J. Chemometr. 15 (2001) 485e493. [12] S.J. Qin, S. Valle, M.J. Piovoso, On unifying multiblock analysis with application to decentralized process monitoring, J. Chemometr. 15 (2001) 715e742. [13] A. Biancolillo, I. Ma˚ge, T. Næs, Combining SO-PLS and linear discriminant analysis for multi-block classification, Chemometr. Intell. Lab. Syst. 141 (2015) 58e67. [14] T. Næs, O. Tomic, B.H. Mevik, H. Martens, Path modelling by sequential PLS regression, J. Chemometr. 25 (2011) 28e40. [15] F. Castanedo, A review of data fusion techniques, Sci. World J. 2013 (2013), https:// doi.org/10.1155/2013/704504. [16] B. Brownfield, T. Lemos, J.H. Kalivas, Consensus classification using non-optimized classifiers, Anal. Chem. 90 (2018) 4429e4437. [17] A. Biancolillo, F. Marini, J.M. Roger, SO-COVSEL: a novel method for variable selection in a multi-block framework, J. Chemometr. (in press), https://doi.org/10.1002/ cem.3120. [18] I. Ma˚ge, E. Menichelli, T. Næs, Preference mapping by PO-PLS: separating common and unique information in several data blocks, Food Qual. Prefer. 24 (2012) [8]e[16]. [19] I. Ma˚ge, B.H. Mevik, T. Næs, Regression models with process variables and parallel blocks of raw material measurements, J. Chemometr. 22 (2008) 443e456. [20] J. Liu, V. Calhoun, Parallel independent component analysis for multimodal analysis: application to fMRI and EEG data, in: Proceedings of the 4th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Arlington, VA, 2007, pp. 1028e1031. [21] T. Lo¨fstedt, J. Trygg, OnPLSda novel multiblock method for the modelling of predictive and orthogonal variation, J. Chemometr. 25 (2011) 441e455. [22] A. Biancolillo, T. Næs, R. Bro, I. Ma˚ge, Extension of SO-PLS to multi-way arrays: SO-N-PLS, Chemometr. Intell. Lab. Syst. 164 (2017) 113e126. [23] S. Bougeard, E.M. Qannari, C. Lupo, M. Hanafi, From multiblock partial least squares to multiblock redundancy analysis. A continuum approach, Informatica 22 (2011) 1e16. [24] S. Bougeard, M. Qannari, N. Rose, Multiblock redundancy analysis: interpretation tools and application in epidemiology, J. Chemometr. 25 (2011) 467e475. [25] El Ghaziri, V. Cariou, D.R. Rutledge, E.M. Qannari, Analysis of multiblock datasets using ComDim: overview and extension to the analysis of (K þ 1) datasets, J. Chemometr. 30 (2016) 420e429.
REFERENCES
303
[26] S. Wold, N. Kettaneh, K. Tjessem, Hierarchical multiblock PLS and PC models for easier model interpretation and as an alternative to variable selection, J. Chemometr. 10 (1996) 463e482. [27] G. Mazerolles, M. Hanafi, E. Dufour, D. Bertrand, E.M. Qannari, Common components and specific weights analysis: a chemometric method for dealing with complexity of food products, Chemometr. Intell. Lab. Syst. 81 (2006) 41e49. [28] S. Wold, S. Hellberg, T. Lundstedt, M. Sjo¨stro¨m, H. Wold (Eds.), Proc. Symp. On PLS Model Building: Theory and Application, Frankfurt Am Main, 1987; Also Tech. Rep., Department of Organic Chemistry, Umea˚ University, 1987. [29] M. Schouteden, K. Van Deun, S. Pattyn, I. Van Mechelen, Sca with rotation to distinguish common and distinctive information in linked data, Behav. Res. Methods 45 (2013) [822]e[833]. [30] E.F. Lock, K.A. Hoadley, J.S. Marron, A.B. Nobel, Joint and individual variation explained (jive) for integrated analysis of multiple data types, Ann. Appl. Stat. 7 (2013) [523]e[542]. [31] M. Hanafi, A. Kohler, M. Qannari, Shedding new light on hierarchical principal component analysis, J. Chemometr. 24 (2010) 703e709. [32] R.J. Barnes, M.S. Dhanoa, S.J. Lister, Standard normal variate transformation and de-trending of near-infrared diffuse reflectance spectra, Appl. Spectrosc. 43 (1989) 772e777. [33] L. Elsner, Block scaling with optimal Euclidean condition, Linear Algebra Appl. 58 (1984) 69e73. [34] A. Biancolillo, K.H. Liland, I. Ma˚ge, T. Næs, R. Bro, Variable selection in multi-block regression, Chemometr. Intell. Lab. Syst. 156 (2016) 89e101. [35] B. Galindo-Prieto, J. Trygg, P. Geladi, A new approach for variable influence on projection (VIP) in O2PLS models, Chemometr. Intell. Lab. Syst. 160 (2017) 110e124. [36] J.M. Roger, B. Palagos, D. Bertrand, E. Fernandez-Ahumada, CovSel: variable selection for highly multivariate and multi-response calibration Application to IR spectroscopy, Chemometr. Intell. Lab. Syst. 106 (2011) 216e223. [37] T. Næs, O. Tomic, N.K. Afseth, V. Segtnan, I. Ma˚ge, Multi-block regression based on combinations of orthogonalisation, PLS-regression and canonical correlation analysis, Chemometr. Intell. Lab. Syst. 124 (2013) 32e42. [38] M. Cocchi, Chemometrics for food quality control and authentication, in: R.A. Meyers (Ed.), Encyclopedia of Analytical Chemistry, John Wiley & Sons Ltd, Hoboken, NJ, 2017, pp. 1e29. [39] N. Dupuy, O. Galtier, Y. Le Dre´au, C. Pinatel, J. Kister, J. Artaud, Chemometric analysis of combined NIR and MIR spectra to characterize French olives, Eur. J. Lipid Sci. Technol. 112 (2010) 463e475. [40] R. Korifi, Y. Le Dre´au, J. Molinet, J. Artaud, N. Dupuy, Composition and authentication of virgin olive oil from French PDO regions by chemometric treatment of Raman spectra, J. Raman Spectrosc. 42 (2011) 1540e1547. [41] M. Bevilacqua, R. Bucci, A.D. Magrı`, A.L. Magrı`, F. Marini, Tracing the origin of extra virgin olive oils by infrared spectroscopy and chemometrics: a case study, Anal. Chim. Acta 717 (2012) 39e51. [42] F. Bachion de Santana, L. Caixeta Gontijo, H. Mitsutake, S. Ju´nior Mazivila, L. de Souza, W. Borges Neto, Non-destructive fraud detection in rosehip oil by MIR spectroscopy and chemometrics, Food Chem. 209 (2016) 228e233. [43] L. Bertacchini, M. Cocchi, M. Li Vigni, A. Marchetti, E. Salvatore, S. Sighinolfi, M. Silvestri, C. Durante, Classification of cereal flours by chemometric analysis of MIR spectra, J. Agric. Food Chem. 52 (2004) 1062e1067.
304
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
[44] A. Biancolillo, S. De Luca, S. Bassi, L. Roudier, R. Bucci, A.D. Magrı`, F. Marini, Authentication of an Italian PDO hazelnut (“Nocciola Romana”) by NIR spectroscopy, Environ. Sci. Pollut. Res. 25 (2018) 28780e28786. [45] M. Cocchi, R. Bro, C. Durante, D. Manzini, A. Marchetti, F. Saccani, S. Sighinolfi, A. Ulrici, Analysis of sensory data of Aceto Balsamico Tradizionale di Modena (ABTM) of different ageing by application of PARAFAC models, Food Qual. Prefer. 17 (2006) 419e428. [46] M. Cocchi, C. Durante, A. Marchetti, C. Armanino, M. Casale, Characterization and discrimination of different aged ‘Aceto Balsamico Tradizionale di Modena’ products by head space mass spectrometry and chemometrics, Anal. Chim. Acta 589 (2007) 96e104. [47] P. Firmani, S. De Luca, R. Bucci, F. Marini, A. Biancolillo, Near Infrared (NIR) spectroscopy-based classification for the authentication of Darjeeling black tea, Food Control 100 (2019) 292e299. [48] L. Wang, F.S.C. Lee, X. Wang, Near-infrared spectroscopy for classification of licorice (Glycyrrhiza uralensis Fisch) and prediction of the glycyrrhizic acid (GA) content, LWT e Food Sci. Technol. 40 (2007) 83e88. [49] S. De Luca, E. Ciotoli, A. Biancolillo, A.D. Magrı`, F. Marini, Simultaneous quantification of caffeine and chlorogenic acid in coffee green beans and varietal classification of the samples by HPLC-DAD coupled with chemometrics, Environ. Sci. Pollut. Res. 25 (2018) 28748e28759. [50] D. Yang, Y. Ying, Applications of Raman spectroscopy in agricultural products and food analysis: a review, Appl. Spectrosc. Rev. 46 (2011) 539e560. [51] J. Riedl, S. Esslinger, C. Fauhl-Hassek, Review of validation and reporting of nontargeted fingerprinting approaches for food authentication, Anal. Chim. Acta 885 (2015) 17e32. [52] R. Karoui, G. Downey, C. Blecker, Mid-infrared spectroscopy coupled with chemometrics: a tool for the analysis of intact food systems and the exploration of their molecular StructureQuality relationships a review, Chem. Rev. 110 (2010) 6144e6168. [53] A. Biancolillo, F. Marini, Chapter Four e chemometrics applied to plant spectral analysis, in: J. Lopes, C. Sousa (Eds.), Vibrational Spectroscopy for Plant Varieties and Cultivars Characterization, Comprehensive Analytical Chemistry, vol. 80, Elsevier, Amsterdam, 2018, pp. 69e104. [54] L. Bertacchini, M. Cocchi, M. Li Vigni, A. Marchetti, E. Salvatore, S. Sighinolfi, M. Silvestri, C. Durante, The impact of chemometrics on food traceability, data handling in science and technology, in: F. Marini (Ed.), Chemometrics in Food Chemistry, Elsevier, Oxford, UK, 2013, pp. 371e410. [55] A. Bajoub, S. Medina-Rodrı´guez, M. Go´mez-Romero, E.A. Ajal, M.G. Bagur-Gonza´lez, A. Ferna´ndez-Gutie´rrez, A. Carrasco-Pancorbo, Assessing the varietal origin of extravirgin olive oil using liquid chromatography fingerprints of phenolic compound, data fusion and chemometrics, Food Chem. 215 (2017) 245e255. [56] R. Nescatelli, R.C. Bonanni, R. Bucci, A.L. Magrı`, A.D. Magrı`, F. Marini, Geographical traceability of extra virgin olive oils from Sabina PDO by chromatographic fingerprinting of the phenolic fraction coupled to chemometrics, Chemometr. Intell. Lab. Syst. 139 (2014) 175e180. [57] M. Casale, P. Oliveri, C. Casolino, N. Sinelli, P. Zunin, C. Armanino, M. Forina, S. Lanteri, Characterisation of PDO olive oil Chianti Classico by non-selective (UVe visible, NIR and MIR spectroscopy) and selective (fatty acid composition) analytical techniques, Anal. Chim. Acta 712 (2012) 56e63. [58] D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De Jong, P.L. Lewi, J. SmeyersVerbeke, Handbook of Chemometrics and Qualimetrics: Part B, Elsevier, Amsterdam, 1998, p. 213.
REFERENCES
305
[59] C. Pizarro, S. Rodrı´guez-Tecedor, N. Pe´rez-del-Notario, I. Esteban-Dı´ez, J.M. Gonza´lezSa´iz, Classification of Spanish extra virgin olive oils by data fusion of visible spectroscopic fingerprints and chemical descriptors, Food Chem. 138 (2013) 915e922. [60] E. Borra`s, J. Ferre´, R. Boque´, M. Mestres, L. Acen˜a, A. Calvo, O. Busto, Olive oil sensory defects classification with data fusion of instrumental techniques and multivariate analysis (PLS-DA), Food Chem. 203 (2016) 314e322. [61] M. Bevilacqua, R. Bucci, A.D. Magrı`, A.L. Magrı`, F. Marini, Data fusion for food authentication combining near and mid infrared to trace the origin of extra virgin olive oils, NIR News 24 (2013) 12e15. [62] S. Buratti, C. Malegori, S. Benedetti, P. Oliveri, G. Giovanelli, E-nose, e-tongue and e-eye for edible olive oil characterization and shelf life assessment: a powerful data fusion approach, Talanta 182 (2018) 131e141. [63] Y.B. Monakhova, R. Godelmann, A. Hermann, T. Kuballa, C. Cannet, H. Scha¨fer, M. Spraul, D.N. Rutledge, Synergistic effect of the simultaneous chemometric analysis of 1H NMR spectroscopic and stable isotope (SNIF-NMR, 18O, 13C) data: application to wine analysis, Anal. Chim. Acta 833 (2014) 29e39. [64] S. Roussel, V.E. Bellon-Maurel, J.M. Roger, P. Grenier, Authenticating white grape must variety with classification models based on aroma sensors, FT-IR and UV spectrometry, J. Food Eng. 60 (2003) 407e419. [65] S. Roussel, V.E. Bellon-Maurel, J.M. Roger, P. Grenier, Fusion of aroma, FT-IR and UV sensor data based on the Bayesian inference. Application to the discrimination of white grape varieties, Chemom. Intell. Lab. Syst. 65 (2003) 209e219. [66] S. Tao, J. Li, J. Li, J. Tang, J. Mi, L. Zhao, Discriminant analysis of red wines from different aging ways by information fusion of NIR and MIR spectra, in: D. Li, Y. Chen (Eds.), Computer and Computing Technologies in Agriculture V. CCTA 2011, IFIP Advances in Information and Communication Technology, vol. 369, Springer, Berlin, 2012. [67] N. Prieto, M.L. Rodriguez-Me´ndez, R. Leardi, P. Oliveri, D. Hernando-Esquisabel, M. Iniguez-Crespo, J.A. de Saja, Application of multi-way analysis to UVevisible spectroscopy, gas chromatography and electronic nose data for wine ageing evaluation, Anal. Chim. Acta 719 (2012) 43e51. [68] M. Silvestri, L. Bertacchini, C. Durante, A. Marchetti, E. Salvatore, M. Cocchi, Application of data fusion techniques to direct geographical traceability indicators, Anal. Chim. Acta 769 (2013) 1e9. [69] M. Silvestri, A. Elia, D. Bertelli, E. Salvatore, C. Durante, M. Li Vigni, A. Marchetti, M. Cocchi, A mid level Data Fusion strategy for the varietal classification of Lambrusco P.D.O. Wines, Chemometr. Intell. Lab. Syst. 137 (2014) 181e189. [70] AGER, Project: New Analytical Methodologies for Varietal and Geographical Traceability of Oenological Product; Contract N. [2][01][1]-0285, http://www. progettoager.it/index.php/settori/il-progetto-2008-2015/il-progetto-2008-2015vitivinicolo. [71] G. Papotti, D. Bertelli, R. Graziosi, M. Silvestri, L. Bertacchini, C. Durante, M. Plessi, Application of one- and two-dimensional NMR spectroscopy for the characterization of protected designation of origin Lambrusco wines of Modena, J. Agric. Food Chem. 61 (2013) 1741e1746. [72] C. Durante, C. Baschieri, L. Bertacchini, D. Bertelli, M. Cocchi, A. Marchetti, S. Sighinolfi, An analytical approach to Sr isotope ratio determination in Lambrusco wines for geographical traceability purposes, Food Chem. 173 (2015) 557e563. [73] C. Durante, L. Bertacchini, L. Bontempo, F. Camin, D. Manzini, P. Lambertini, M. Paolini, From soil to grape and wine: variation of light and heavy elements isotope ratios, Food Chem. 210 (2016) 648e659.
306
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
[74] Q. Chen, C. Sun, Q. Ouyang, A. Liu, H. Lia, J. Zhao, Classification of vinegar with different marked ages using olfactory sensors and gustatory sensors, Anal. Methods 6 (2014) 9783e9790. [75] R. Rios-Reina, R.M. Callejo´n, F. Savorani, J.M. Amigo, M. Cocchi, Data fusion approaches in spectroscopic characterization and classification of PDO wine vinegars, Talanta 198 (2019) 560e572. [76] M. Cocchi, M. Li Vigni, C. Durante, Chemometrics-bioinformatics, in: A.G. Constatinos, G.P. Danezis (Eds.), Food Authentication, Management, Analysis and Regulation, Chapter 17, Wiley Blackwell, Hoboken, NJ, 2017, ISBN 9781118810255, pp. 483e520. [77] M. Silvestri, Data Fusion to Integrate Data of Different Nature in Food Authenticity, PhD Thesis, University of Modena and Reggio Emilia, 2013. [78] J.M. Gutie´rrez, Z. Haddi, A. Amari, B. Bouchikhi, A. Mimendia, X. Ceto´, M. del Valle, Hybrid electronic tongue based on multisensor data fusion for discrimination of beers, Sensor. Actuator. B Chem. 177 (2013) 989e996. [79] R.I. Jennrich, Stepwise discriminant analysis, in: K. Enslein, A. Ralston, H.S. Wilf (Eds.), Statistical Methods for Digital Computers, vol. 3, Wiley, New York, NY, 1977, pp. 76e96. [80] L. Vera, L. Acen˜a, J. Guasch, R. Boque´, M. Mestres, O. Busto, Discrimination and sensory description of beers through data fusion, Talanta 87 (2011) 136e142. [81] A. Savitzky, M.J.E. Golay, Smoothing and differentiation of data by simplified least squares procedures, Anal. Chem. 36 (1964) 1627e1639. [82] A. Biancolillo, R. Bucci, A.L. Magrı`, A.D. Magrı`, F. Marini, Data-fusion for multiplatform characterization of an Italian craft beer aimed at its authentication, Anal. Chim. Acta 820 (2014) 23e31. [83] M. Barker, W. Rayens, Partial least squares for discrimination, J. Chemometr. 17 (2003) 166e173. [84] L. Sta˚le, S. Wold, Partial least squares analysis with cross-validation for the two-class problem: a Monte Carlo study, J. Chemometr. 1 (1987) 185e196. [85] S. Wold, M. Sjo¨stro¨m, SIMCA: a method for analysing chemical data in terms of similarity and analogy, in: B.R. Kowalski (Ed.), Chemometrics, Theory and Application, American Chemical Society Symposium Series No. 52, American Chemical Society, Washington, DC, 1977, pp. 243e282. [86] S. Wold, M. Sjo¨stro¨m, Comments on a recent evaluation of the SIMCA method, J. Chemom. 1 (1987) 243e245. [87] K. Parson, On lines and plans of closes fit to systems of points in space, Philos. Mag. A 2 (1901) 559e572. [88] S. Wold, K. Esbensen, P. Geladi, Principal component analysis, Chemometr. Intell. Lab. Syst. 2 (1987) 37e52. [89] I.T. Jolliffe, Principal Component Analysis, second ed., Springer, New York, NY, 2002. [89a] C. Di Natale, R. Paolesse, A. Macagnano, A. Mantini, A. D’Amico, A.A. Legin, L. Lvova, A. Rudnitskaya, Y. Vlasov, Electronic nose and electronic tongue integration for improved classification of clinical and food samples, Sensor. Actuator. B Chem. 64 (2000) 15e21. [90] M. Henningsson, K. Ostergren, R. Sundberg, P. Dejmek, Sensor fusion as a tool to monitor dynamic dairy processes, J. Food Eng. 76 (2006) 154e162. [91] G.P.Y. Clarke, Inverse estimates from a multiresponse model, Biom 48 (1992) 1081e1094. [92] R. Sundberg, Multivariate calibrationeedirect and indirect regression methodology, Scand. J. Stat. 26 (1999) 161e207.
REFERENCES
307
[93] M. Loudiyi, D.N. Rutledge, A. Aı¨t-Kaddour, ComDim for explorative multi-block data analysis of cantal-type cheeses: effects of salts, gentle heating and ripening, Food Chem. 264 (2018) 401e410. [94] M. Loudiyi, R. Karoui, D.N. Rutledge, R. Lavigne, M.C. Montel, A. Aı¨t-Kaddour, Contribution of fluorescence spectroscopy and independent components analysis to the evaluation of NaCl and KCl effects on molecular-structure and fat melting temperatures of cantal-type cheese, Int. Dairy J. 73 (2017) 116e127. [95] D.N. Rutledge, D.J. Rimbaud Bouveresse, Independent components analysis with the JADE algorithm, Trends Anal. Chem. 50 (2013) 22e32. [96] E.M. Qannari, I. Wakeling, P. Courcoux, H.J. MacFieb, Defining the underlying sensory dimensions, Food Qual. Prefer. 11 (2000) 151e154. [97] G. Feron, C. Ayed, E.M. Qannari, P. Courcoux, H. Laboure, E. Guichard, Understanding aroma release from model cheeses by a statistical multiblock approach on oral processing, PLoS One 9 (2014) e93113. [98] G. Cozzi, J. Ferlito, G. Pasini, B. Contiero, F. Gottardo, Application of near-infrared spectroscopy as an alternative to chemical and color analysis to discriminate the production chains of Asiago d’Allevo cheese, J. Agric. Food Chem. 57 (2009) 11449e11454. [99] R. Banerjee(Roy), B. Tudu, L. Shaw, A. Jana, N. Bhattacharyya, R. Bandyopadhyay, Instrumental testing of tea by combining the responses of electronic nose and tongue, J. Food Eng. 110 (2012) 356e363. [100] T. Artursson, M. Holmberg, Wavelet transform of electronic tongue data, Sensor. Actuator. B Chem. 87 (2002) 379e391. [101] R.O. Duda, D.G. Stork, P.E. Hart, Pattern Classification, second ed., John Wiley and Sons, Hoboken, NJ, 2001. [102] S. Haykin, Neural Networks e a Comprehensive Foundation, second ed., Pearson Education, London, UK, 2001. [103] T. Poggio, F. Girosi, Networks for approximation and learning, Proc. IEEE 78 (1990) 1481e1497. [104] D.F. Specht, Probabilistic neural networks, Neural Network. 3 (1990) 109e118. [105] R. Banerjee (Roy), P. Chattopadhyay, B. Tudu, N. Bhattacharyya, R. Bandyopadhyay, Artificial flavor perception of black tea using fusion of electronic nose and tongue response: a Bayesian statistical approach, J. Food Eng. 142 (2014) 87e93. [106] M. Fahim, M.H. Siddiqi, S. Lee, Y.K. Lee, A multi-strategy Bayesian model for sensor fusion in smart environments, in: Proceeding e 5th International Conference on Computer Sciences and Convergence Information Technology, ICCIT, 2010, pp. 52e57. [107] J. Wang, Z. Wei, The classification and prediction of green teas by electrochemical response data extraction and fusion approaches based on the combination of e-nose and e-tongue, RSC Adv. (2015) 106959e106970. [108] M. Xu, J. Wang, S. Gu, Rapid identification of tea quality by E-nose and computer vision combining with a synergetic data fusion strategy, J. Food Eng. 241 (2019) 10e17. [109] Q. Chen, C. Sun, Q. Ouyang, Y. Wang, A. Liu, H. Li, J. Zhao, Classification of different varieties of Oolong tea using novel artificial sensing tools and data fusion, LWT e Food Sci. Technol. 60 (2015) 781e787. [110] J. Lin, P. Zhang, Z. Pan, H. Xu, Y. Luo, X. Wang, Discrimination of oolong tea (Camellia sinensis) varieties based on feature extraction and selection from aromatic profiles analysed by HS-SPME/GCeMS, Food Chem. 141 (2013) 259e265. [111] L. Li, S. Xie, J. Ning, Q. Chen, Z. Zhang, Evaluating green tea quality based on multisensor data fusion combining hyperspectral imaging and olfactory visualization systems, J. Sci. Food Agric. 99 (2018) 1787e1794. [112] J. Ning, J. Sun, S. Li, M. Sheng, Z. Zhang, Classification of five Chinese tea categories with different fermentation degrees using visible and near-infrared hyperspectral imaging, Int. J. Food Prop. 20 (2017) 1515e1522.
308
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
[113] A. Dankowska, W. Kowalewski, Tea types classification with data fusion of UVeVis, synchronous fluorescence and NIR spectroscopies and chemometric analysis, Spectrochim. Acta 211 (2019) 195e202. [114] J.H. Friedman, Regularized discriminant analysis, J. Am. Stat. Assoc. 84 (1989) 165e175. [115] F. Winquist, I. Lundstro¨m, P. Wide, The combination of an electronic tongue and an electronic nose, Sensor. Actuator. B Chem. 58 (1999) 512e517. [116] Z. Haddi, S. Mabrouk, M. Bougrini, K. Tahri, K. Sghaier, H. Barhoumi, N. El Bari, A. Maaref, N. Jaffrezic-Renault, B. Bouchikhi, E-Nose and e-Tongue combination for improved recognition of fruit juice samples, Food Chem. 150 (2014) 246e253. [117] X. Hong, J. Wang, Detection of adulteration in cherry tomato juices based on electronic nose and tongue: comparison of different data fusion approaches, J. Food Eng. 126 (2014) 89e97. [118] J. Charve, C. Chen, A.D. Hegeman, G.A. Reineccius, Evaluation of instrumental methods for the untargeted analysis of chemical stimuli of orange juice flavour, Flavour Fragrance J. 26 (2011) 429e440. [119] F. Mendoza, R. Lu, H. Cen, Comparison and fusion of four nondestructive sensors for predicting apple fruit firmness and soluble solids content Postharvest, Biol. Technol 73 (2012) 89e98. [120] M. Zude, B. Herold, J.M. Roger, V. Bellon-Maurel, S. Landahl, Non-destructive tests on the prediction of apple fruit flesh firmness and soluble solids content on tree and in shelf life, J. Food Eng. 77 (2006) 254e260. [121] V. Steinmetz, J.M. Roger, E. Molto, J. Blasco, On-line fusion of colour camera and spectrophotometer for sugar content prediction of apples, J. Agric. Eng. Res. 73 (1999) 207e216. [122] H. Wu, T. Yue, Y. Yuan, Authenticity tracing of apples according to variety and geographical origin based on electronic nose and electronic tongue, Food Anal. Methods 11 (2018) 522e532. [123] A. Rudnitskaya, D. Kirsanov, A. Legin, K. Beullens, J. Lammertyn, B.M. Nicolaı¨, J. Irudayaraj, Analysis of apples varieties e comparison of electronic tongue with different analytical techniques, Sensor. Actuator. B Chem. 116 (2006) 23e28. [124] C. Di Natale, M. Zude-Sasse, A. Macagnano, R. Paolesse, B. Herold, A. D’Amico, Outer product analysis of electronic nose and visible spectra: application to the measurement of peach fruit characteristics, Anal. Chim. Acta 459 (2002) 107e117. [125] K.K. Vursavus, Y.B. Yurtlu, B. Diezma-Iglesias, L. Lleo-Garcia, M. Ruiz-Altisent, Classification of the firmness of peaches by sensor fusion, Int. J. Agric. Biol. Eng. 8 (2015) 104e115. [126] M. Ruiz-Altisent, L. Lleo´, F. Riquelme, Instrumental quality assessment of peaches: fusion of optical and mechanical parameters, J. Food Eng. 74 (2006) 490e499. [127] N. Ozer, B.A. Engel, J.E. Simon, Fusion classification techniques for fruit quality, Trans. ASAE 38 (1995) 1927e1934. [128] M. Hohmann, Y. Monakhova, S. Erich, N. Christoph, H. Wachter, U. Holzgrabe, Differentiation of organically and conventionally grown tomatoes by chemometric analysis of combined data from proton nuclear magnetic resonance and mid-infrared spectroscopy and stable isotope analysis, J. Agric. Food Chem. 63 (2015) 9666e9675. [129] Y.B. Monakhova, M. Hohmann, N. Christoph, H. Wachter, D.N. Rutledge, Improved classification of fused data: synergetic effect of partial least squares discriminant analysis (PLS-DA) and common components and specific weights analysis (CCSWA) combination as applied to tomato profiles (NMR, IR and IRMS), Chemometr. Intell. Lab. Syst. 156 (2016) 1e6. [130] E.M. Qannari, I. Wakeling, H.J.H. MacFie, A hierarchy of models for analysis sensory data, Food Qual. Prefer. 6 (1995) 309e314.
REFERENCES
309
[131] S.A. Drivelos, K. Higgins, J.H. Kalivas, S.A. Haroutounian, C.A. Georgiou, Data fusion for food authentication. Combining rare earth elements and trace metals to discriminate “Fava Santorinis” from other yellow split peas using chemometric tools, Food Chem. 165 (2014) 316e322. [132] S. Yaoa, T. Li, H.G. Liu, J.Q. Li, Y.Z. Wang, Geographic characterization of Leccinum rugosiceps by ultraviolet and infrared spectral fusion, Anal. Lett. 50 (2017) 2257e2269. [133] W. Wang, C. Li, A multimodal quality inspection system based on 3D, hyperspectral, and X-ray imaging for onions, in: ASABE and CSBE/SCGAB Annual International Meeting, 2014. [134] X.Y. Huang, S.H. Pan, Z.Y. Sun, W.T. Ye, J.H. Aheto, Evaluating quality of tomato during storage using fusion information of computer vision and electronic nose, J. Food Process. Eng. 41 (2018) e12832. [135] C. Assis, H. Vinicius Pereira, V. Silva Amador, R. Augusti, L. Soares de Oliveira, M. Martins Sena, Combining mid infrared spectroscopy and paper spray mass spectrometry in a data fusion model to predict the composition of coffee blends, Food Chem. 281 (2019) 71e77. [136] D. Broadhurst, R. Goodacre, A. Jones, J.J. Rowland, D.B. Kell, Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry, Anal. Chim. Acta 348 (1997) 71e86. [137] R.F. Teo´filo, J.P.A. Martins, M.M.C. Ferreira, Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression, J. Chemometr. 23 (2009) 32e48. [138] R. Calvini, G. Foca, A. Ulrici, Data dimensionality reduction and data fusion for fast characterization of green coffee samples using hyperspectral sensors, Anal. Bioanal. Chem. 408 (2016) 7351e7366. [139] G. Downey, R. Briandet, R.H. Wilson, E.K. Kemsley, Near- and mid-infrared spectroscopies in food authentication: coffee varietal identification, J. Agric. Food Chem. 45 (1997) 4357e4361. [140] M.F. Devaux, D. Bertrand, P. Robert, M. Qannari, Application of multidimensional analyses to the extraction of discriminant spectral patterns from NIR spectra, Appl. Spectrosc. 42 (1988) 1015e1019. [141] W. Dong, J. Zhao, R. Hu, Y. Dong, L. Tan, Differentiation of Chinese robusta coffees according to species, using a combined electronic nose and tongue, with the aid of chemometrics, Food Chem. 229 (2017) 743e751. [142] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by backpropagating errors, Nature 323 (1986) 533e536. [143] S. Yener, A. Romano, L. Cappellin, P.M. Granitto, E. Aprea, L. Navarini, T.D. Ma¨rk, F. Gasperi, F. Biasioli, Tracing coffee origin by direct injection headspace analysis with PTR/SRI-MS, Food Res. Int. 69 (2015) 235e243. [144] D.L. Hall, J. Llinas, An introduction to multisensor data fusion, Proc. IEEE 85 (1997) 6e23. [145] T. Hastie, A. Buja, R. Tibshirani, Penalized discriminant analysis, Ann. Stat. 23 (1995) 73e102. [146] L. Breiman, Random forests, Mach. Learn. 45 (2001) 5e32. [147] N. Reis, B.G. Botelho, A.S. Franca, L.S. Oliveira, Simultaneous detection of multiple adulterants in ground roasted coffee by ATR-FTIR spectroscopy and data fusion, Food Anal. Methods 10 (2017) 2700e2709. [148] H. Li, F. Kutsanedzie, J. Zhao, Q. Chen, Quantifying total viable count in pork meat using combined hyperspectral imaging and artificial olfaction techniques, Food Anal. Methods 9 (2016) 3015e3024. [149] W.W. Cheng, D.W. Sun, H. Pu, Y. Liu, Integration of spectral and textural data for enhancing hyperspectral prediction of K value in pork meat, LWT e Food Sci. Technol. 72 (2016) 322e329.
310
10. DATA FUSION STRATEGIES IN FOOD ANALYSIS
[150] M.C.U. Arau´jo, T.C.B. Saldanha, R.K.H. Galva˜o, T. Yoneyama, H.C. Chame, V. Visani, The successive projections algorithm for variable selection in spectroscopic multicomponent analysis, Chemometr. Intell. Lab. Syst. 57 (2001) 65e73. [151] X. Suxia, W. Rui, W. Jiuqing, G. Peiyuan, Study on chicken quality classification method based on K-means-RBF multi-source data fusion, in: Proceedings of the 30th Chinese Control and Decision Conference, CCDC, 2018, pp. 405e410, https:// doi.org/10.1109/CCDC.2018.8407167. [152] D.S. Broomhead, D. Lowe, Multivariable functional interpolation and adaptive networks, Complex Syst. 2 (1988) 321e355. [153] T. Guo, M. Huang, Q. Zhu, Y. Guo, J. Qin, Hyperspectral image-based multi-feature integration for TVB-N measurement in pork, J. Food Eng. 218 (2018) 61e68. [154] J. Yang, D. Zhang, A.F. Frangi, J.Y. Yang, Two-dimensional PCA: a new approach to appearance-based face representation and recognition, IEEE Trans. Pattern Anal. Mach. Intell. 26 (2004) 131e137. [155] G. Peiyuan, X. Hongbing, X. Suxia, S. Mei, B. Man, Research on key quality of sausage with SVM and hyperspectral imaging full scale features, in: 29th Chinese Control and Decision Conference, CCDC), Chongqing, 2017, pp. 4985e4988, https://doi.org/ 10.1109/CCDC.2017.7979378. [156] G. Peiyuan, X. Hongbing, X. Suxia, S. Mei, B. Man, The bacon quality grade intellectual pattern recognition based on neural network of hyperspectral imaging, in: 36th Chinese Control Conference, 2017, pp. 11464e11467, https://doi.org/10.23919/ ChiCC.2017.8029187. [157] T. Kohonen, Self Organizing Maps, Springer, New York, NY, 2001. [158] U. Khulal, J. Zhao, W. Hu, Q. Chen, Intelligent evaluation of total volatile basic nitrogen (TVB-N) content in chicken meat by an improved multiple level data fusion model, Sensor. Actuator. B Chem. 238 (2017) 337e345. [159] L. Huang, J. Zhao, Q. Chen, Y. Zhang, Nondestructive measurement of total volatile basic nitrogen (TVB-N) in pork meat by integrating near infrared spectroscopy, computer vision and electronic nose techniques, Food Chem. 145 (2014) 228e236. [160] K.M. Nunes, M. Vinı´cius, O. Andrade, A.M.P. Santos Filho, M.C. Lasmar, M.M. Sena, Detection and characterisation of frauds in bovine meat in natura by non-meat ingredient additions using data fusion of chemical parameters and ATR-FTIR spectroscopy, Food Chem. 205 (2016) 14e22. [161] C. Alamprese, M. Casale, N. Sinelli, S. Lanteri, E. Casiraghi, Detection of minced beef adulteration with Turkey meat by UVevis, NIR and MIR spectroscopy, LWT e Food Sci. Technol. 53 (2013) 225e232. [162] M. Ottavian, L. Fasolato, L. Serva, P. Facco, M. Barolo, Data fusion for food authentication: fresh/frozenethawed discrimination in West African goatfish (Pseudupeneus prayensis) fillets, Food Bioprocess Technol. 7 (2014) 1025e1036, https://doi.org/ 10.1007/s11947-013-1157-x. ¨ . Balaban, Objective quality assessment of raw Tilapia [163] F. Korel, D.A. Luzuriaga, M.O (Oreochromis niloticus) fillets using electronic nose and machine vision, J. Food Sci. 66 (2001) 1018e1024. [164] J.W. Gardner, E.L. Hines, Pattern analysis techniques, in: E. Kress-Rogers (Ed.), Handbook of Biosensors and Electronic Noses. Medicine, Food, and the Environment, CRC, Boca Raton, FL, 1997, pp. 633e652.