W.L.P. Bredie and M.A. Petersen (Editors) Flavour Science: Recent Advances and Trends 9 2006 Published by Elsevier B.V.
619
Challenges for data analysis in flavour science R a s m u s Bro ( c h a i r m a n ) a, Per M. B r u u n B r o c k h o f f b and T h o m a s Skov a
aDepartment of Food Science, Royal Veterinary and Agricultural University, Copenhagen, Denmark," blnformatics and Mathematical Modelling, Technical University of Denmark, Kongens Lyngby, Denmark 1. I N T R O D U C T I O N The analysis of data from instrumental and sensory flavour analyses and related data types often pose special challenges for the data analyst. This may be due to the richness of information in the data or due to special artefacts that need to be handled. In this workshop, some of the basic and more advanced tools for handling various types of data were illustrated. First, an overview of multivariate methods was given, followed by a description of analysis of sensory data. Finally, new methods for handling GC and electronic nose data were described. 2. M U L T I V A R I A T E ANALYSIS Rasmus Bro started to point out why we need multivariate analysis. Multivariate data analysis uses all available data simultaneously - exactly as in human pattern recognition. For most interesting problems the information is in the relation between variables. Examples were given on how Principal Component Analysis can make interpretation of data tables easier. Even simple questions as 'Which of the samples are most similar?' can be very difficult to answer by looking at a data table in a univariate way. Multivariate data analysis also enables an exploratory approach to data. No previous hypotheses are needed- all data can be analysed and new information can be found. The techniques can be taken one step further: calibration can be extended to multivariate calibration and relations between complex data matrices can be described. One type of variables can be predicted from other types - typically from measurements that are more easily available, more unspecific and more complex. The following examples were given: analysis of water in product from infrared spectra, consumer preference from sensory panel data, off-flavour in beer from fluorescence spectra, and concentration of active components in tablets using near-infrared spectra.
620 A very important issue using these techniques is to distinguish between causality and correlation. Correlation is an observed relation. For instance sales of sunglasses correlate to sales of ice-cream. This is a true and valid relation, but the real cause is an underlying (latent) factor- the sun. Causality includes an explicit relation. For example absorbance depends on amount of analyte in a solution. This is a true and valid relation and the cause is direct: the analyte absorbs. When processes are to be optimised, designed experiments are required, and causal relations is a prerequisite. Using the above examples: Change sales of ice/amount of analyte and see what happens to sales of sunglasses/absorbance. It was concluded that multivariate analysis provides a window into the multivariate space enabling investigation of similarity, grouping, trends, outliers, variables, influence, importance, correlations and relation between blocks. A list of relevant websites was given [1-6]. 3. ANALYSIS OF SENSORY D A T A - WHAT IS SENSOMETRICS? Per Bruun Brockhoff introduced the area of sensometrics, and defined sensometrics as the scientific area that applies mathematical and statistical methods to model data from sensory and consumer science. Examples of sensometrics research were given which both included ANOVA type data and regression/multivariate type data. Also the Sensometric Society which arranges meetings every second year was shortly presented [7]. Different types of relevant statistical methods were then presented. This included binomial statistics and related issues (classical, Thurstonian modeling, random effects/mixed model versions), rank based statistics, design of experiments and analysis of variance (classical, mixed model extensions) and multivariate analysis (analysis of profile data, relating sensory properties to instrumental and/or consumer data, consumer segmentation, multivariate quality control, time intensity data). A typical sensory experiment includes a number of panelists, a number of samples and a number of replications. Simple 3-way mixed model ANOVA is typically carried out for each variable, the main issue being a test for product differences. Assessor/panel performance and monitoring may, however, also be an issue. For this, either the basic ANOVA model or the basic assessor model may be used, and these will investigate differences in level, scaling, disagreement, variabilities and sensitivities [8,9]. The methods mentioned until now are univariate, but since most sensory data is multivariate and at least 3-way, research is carried out to combine ANOVA and multivariate techniques. The combination of univariate models for scaling differences and bi-linear models for the multivariate product differences leads to 3-way models of the PARAFAC type. Finally it is often a wish to relate the multivariate sensory profile data to other data, like chemical/instrumental data on one hand and consumer preferences on the other. This calls for the classical bias regression techniques used extensively in chemometrics, such as Principal Component Regression (PCR), Partial Least Squares Regression (PLSR), and for the consumer data also for segmentation/clustering techniques to identify relevant consumer segments.
621 4. ANALYSIS OF GC-MS AND E L E C T R O N I C NOSE DATA Thomas Skov outlined the basic differences between GC-MS and electronic nose, where the electronic nose is characterised by low specificity and by being most suitable for more rough 'good/bad' analysis. Electronic nose data are typically 3-way (Sample•215 The traditional approach to this type of data is simplification of data by feature extraction resulting in 2-way data suitable for PCA or PLS analysis. An advanced approach is to keep the internal data structure, treating the 3-way data with multi-way models. This has the advantage that there is no initial loss of information. Shifted data will however give problems. This will occur when elution times in GC-MS or sensor time profiles in electronic noses are shifted. This results in obscured data, and the common bi/tri-linear model does not work. A method to overcome this is warping. Warping eliminates shift-related artefacts from measurements by correcting the time axis of a sample profile towards a reference. Successful warping increases explained variance with the use of fewer principal components. Furthermore the width of peak shaped loadings decreases and the shape of secondary artifactual loadings (derivatives) is improved. An example was given on PCA versus PARAFAC modelling (licorice with off-flavour), and improved clustering and separation could be demonstrated using threeway models. It was concluded that data structure and dimensionality is a key factor when analysing GC-MS and electronic nose data. Shifted data needs attention, but can be taken care of before modeling or in the modeling step, and advanced chemometric models can give a more efficient modeling of data. References
1. 2. 3. 4. 5. 6. 7. 8. 9.
www.models.kvl.dk (KVL; software, courses etc.). www.spectroscopynow.com (Links on chemometrics). www.models.kvl.dk/ris/risweb.isa (Database with 8000 papers). www.camo.no/(Unscrambler). www.umetrics.com/(Simca). www.eigenvector.com (PLS Toolbox for matlab). www.sensometric.org (the Sensometric Society). P.M. Brockhoff and I.M. Skovgaard, Food Quality Pref., 5 (1994) 215. P.B. Brockhoff, Food Quality Pref., 14 (5-6) (2003) 425.