Food Quality and Preference 11 (2000) 487±495
www.elsevier.com/locate/foodqual
Comparison of odour sensory pro®les performed by two independent trained panels following the same descriptive analysis procedures Nathalie Martin a,*, Pascal Molimard b, Henry Eric Spinnler a, Pascal Schlich c a
DeÂpartement des Sciences et Industries Alimentaires et Biologiques, Institut National Agronomique Paris-Grignon, 78850 Thiverval-Grignon, France b SKW Biosystems, Direction Cultures et Enzymes, Institut National de la Recherche Agronomique, Laboratoire de Recherches sur les AroÃmes, 17 rue de Sully, B.P. 1540, 21034 Dijon Cedex, France c Institut National de la Recherche Agronomique, Laboratoire de Recherches sur les AroÃmes, 17 rue de Sully, B.P. 1540, 21034 Dijon Cedex, France Received 15 March 2000; received in revised form 20 June 2000; accepted 3 July 2000
Abstract Odour sensory pro®ling of 28 associations of cheese ripening micro-organisms was performed by two panels of 10 assessors on two dierent sites. Sample preparation, training protocols and references, tasting procedures and scoring were similar in the two laboratories. Panel 2 used 10 attributes and panel 1 used these terms plus 4 extra descriptors. Analysis of variance and multivariate methods (canonical variate analysis, generalised procrustes analysis and STATIS) exhibited dierences between assessors within a panel and between panels concerning the use of the scoring scale and the strength of product discrimination by attribute. Panel 1 was more sensitive to fruity notes and panel 2 to sulphury odours. However, a good overlap in the separate and pooled analyses suggested the same sample clustering in three main groups and showed that the 2 panels gave consistent results. # 2000 Elsevier Science Ltd. All rights reserved. Keywords: Odour pro®ling; Descriptive analysis; Panel comparison; Procrustes; ANOVA; STATIS
1. Introduction Standardising sensory procedures in order to obtain consistent results has long been, and still is, a major preoccupation of many laboratories dealing with sensory analysis. In addition, growing international trade and the subsequent production of the same product at many dierent sites increase the need for better standardisation of quality control. Several inter-laboratory sensory trials have been reported. Most of them concern sensory pro®ling performed on various foods (cheese, beef, ®sh, almonds, olive oil) or beverages (beer, chocolate, coee, milk). The procedures and objectives of these studies show two main dierences: the level of training of the panels and the sensory methodology applied. Most often, the experience of the assessors belonging to each panel is dierent in length (Apparicio,
* Corresponding author. Tel.: +33-1-30-81-54-72; fax.: +33-1-3081-55-97. E-mail address:
[email protected] (N. Martin).
Gutierrez & Rodriguez, 1991; Daget & Collyer, 1984; Guerrero, Gou & Arnau, 1997; Hirst & Muir, 1994) or nature (Cardello et al., 1982; Clapperton & Piggott, 1979; Medeiros, Field, Menkaus & Russell, 1987; Nielsen & Zannoni, 1998; Roberts & Vickers, 1994). In the few studies mentioning panel training, the same basic sensory procedures were used (Apparicio et al., 1991; Daget & Collyer, 1984; Heymann, 1994; Hunter & McEwan, 1998). The panels either designed their own vocabulary (Claassen & Lawless, 1992; Guerrero et al., 1997; Heymann; Hirst & Muir, 1994; Hunter, Muir & Brennan, 1995; Risvik, Colwill, McEwan & Lyon, 1992; Roberts & Vickers, 1994; The European Sensory Network [ESN], 1996) or used a common list of attributes (Apparicio et al., 1991; Burke, Spooner & Hegarty, 1997; Cardello et al., 1982; Clapperton & Piggott, 1979; Daget & Collyer, 1984; Medeiros et al., 1987). Sensory procedures can also dier according to the way the products are prepared (Drans®eld et al., 1982) or the scale used (Heymann, 1994; Hirst & Muir, 1994; Hunter & McEwan, 1998; Nielsen & Zannoni, 1998; Roberts & Vickers, 1994).
0950-3293/00/$ - see front matter # 2000 Elsevier Science Ltd. All rights reserved. PII: S0950-3293(00)00021-5
488
N. Martin et al. / Food Quality and Preference 11 (2000) 487±495
When panels employ the same terminology, results generally show a good inter-panel agreement. Comparing pro®les of trained panellists and consumers, Cardello et al. (1982), Clapperton and Piggott (1979) and Medeiros et al. (1987) observed good correlation between panels although a broader range of perceptions was evidenced and recognised by trained panellists. Studies including panellists with the same level of training also report comparable results between panels regarding the structure of the sample spaces i.e. the relative similarities and dierences between products. However, the interpretation of the underlying sensory dimensions responsible for the perceived dierences may dier between panels due to dierences in each panellist's understanding and use of certain attributes (Apparicio et al., 1991; Daget & Collyer, 1984; Hunter & McEwan, 1998; Risvik et al., 1992). Burke, Spooner and Hegarty (1997) emphasised the importance of the understanding of attributes in their discussion of how 7 panels used the same descriptors to describe dierent ¯avour characteristics of beer. These authors, whose panels were not speci®cally trained, also recommend a standardised training to improve inter-panel consensus, similar to training designed to homogenise panellists' judgements within a single panel. The objective of this study was to determine whether the odour pro®les obtained by two independent panels trained in a similar way, using the same vocabulary and proceeding according to the same descriptive quantitative analysis procedures would be comparable for a quite large set of products resulting from biological transformation (fermentation) and prepared in two different laboratories. Unlike earlier reports, this study dealt with non-commercial products produced in a small series and independently, in two laboratories. It is well known that small scale production is generally less reproducible than industrial production. Moreover, sensory evaluation was restricted to odour perceptions, generally considered to be the most dicult to assess. This paper provides an additional experience in the area of scienti®c and practical interest concerning the comparison of panel performance. It also presents, compares and discusses the eciency of dierent statistical tools that can be used for a relevant comparison in such a context. 2. Materials and methods 2.1. Samples The samples studied were microbial associations of three dierent yeasts, three dierent strains of Geotrichum candidum and ®ve dierent bacteria, commonly used in bacteria and mould surface ripened cheese. All samples were cultured on a model lactic curd. Thirty-
nine samples were thus prepared and evaluated (Martin, Savonitto, Molimard, Berger, Brousse & Spinnler, 1999). The cultures were stored in darkness at 12 C for 21 days. After incubation, the cultured curd was blended with Milli-Q1 water (2:1, wt/wt) at 20,500 rpm for 40 s (Ultra-Turrax1 model T25 equipped with an S25N18G grinding probe, IKA Labortechnick, Staufen, Germany). For sensory analysis, the solution was distributed into 60-ml coded brown ¯asks, 25 ml per ¯ask. The ¯asks were stored overnight at 4 C and then held at 20 C for about 3 h before sensory evaluation. The samples were prepared independently in the two dierent laboratories and tested independently in each of the laboratories. 2.2. Panels Two panels of 10 trained assessors each, participated in the study on two dierent locations (Laboratoire de GeÂnie et Microbiologie des ProceÂdeÂs Alimentaires, Institut National de la Recherche Agronomique, Thiverval-Grignon, France and Laboratoire de Recherches sur les AroÃmes, Institut National de la Recherche Agronomique, Dijon, France). Ten graduate students (4 males, 6 females), performing a training course or PhD work in the laboratory, participated in panel 1. None had participated in a sensory pro®ling of cheese but half of them had been previously involved in sensory assessment of fresh dairy products. Panel 2 was composed of 10 assessors (6 males, 4 females) selected from the laboratory sta for their experience in ¯avour analysis. All of them had been previously involved in sensory assessment of various food products. Both panels were speci®cally trained to use the sensory procedures and to describe the samples under study. 2.3. Sensory procedures Training and pro®ling procedures were the same for the two panels. The judges were given a list of 10 common attributes de®ned from former studies on a similar set of products as describing the main olfactory dierences between samples: intensity (overall odour intensity), garlic, cabbage, chocolate, Munster, banana, surimi, carapace, solvent, ammonia. Eight 45-min sessions were dedicated to training the assessors to properly use the vocabulary and the rating scale. During this step, panel 1 reached a consensus and chose to use four more descriptors: apple, blackcurrant, Cancoillotte and foul. These extra attributes were included in panel 1 list only. References de®ned for each attribute and used for training are reported in Table 1. Each panel practised using the unstructured intensity scale by assessing different samples of the experimental design. Odour pro®ling was performed in a monadic way: the samples were smelt one after the other. Five samples
N. Martin et al. / Food Quality and Preference 11 (2000) 487±495 Table 1 Odor attributes and references used for the descriptive analyses Attribute
Reference standard
Odor intensity Banana Solvent Cabbage Garlic Chocolate Munster Surimi Carapace Ammonia Blackcurrant Apple
Odor overall intensity Isoamyl acetatea Ethyl acetatea Dimethyldisul®dea Allyl disul®dea Ovomaltineb Methylthiobutyratea Dimethylsul®dea SK pastac Ammoniad Fresh crushed blackcurrant Grany Smith apple ¯avoured yoghurte Cancoillotte cheese Landelf 1,5- diaminopentanea
Cancoillotte Foul a b c d e f
Concentration 15 ppb 5 ppm 200 ppb 150 ppb 150 ppm 100 ppb 20 ppb 150 ppm 200 ppm
100 ppm
Aldrich (Saint Quentin Fallavier, France). Wander (Annonay, France). Isnard-Lyraz (Fresnes, France). Prolabo (Fontenay Sous-Bois, France). Danone (Levallois-Perret, France). Marcillat (Corcieux, France).
were presented per session for both panels. However, the sample presentation design (Latin square) diered between the two panels. Within panels, the order of presentation of the samples within a session was balanced to avoid carry-over and order eects. Sensory evaluations were conducted in air-conditioned rooms (20 C), under white light in separate booths. Scores were recorded directly on the same computer system using FIZZ software (BIOSYSTEMES1, Couternon, France). 2.4. Data analysis Checking for microbial contamination led to the discarding of 11 of the 39 samples tested from the data analysis. Five and four associations were contaminated for laboratory 1 and 2 respectively. Two associations showed contamination in both laboratories. Consequently, the data set used to compare the performance of the two panels included 28 common samples. Global analyses, including the whole data set, were ®rst carried out to compare the results from the two laboratories. Then the data from each panel were analysed separately in order to assess each assessor and panel performance. All the analyses were performed on Statistical Analysis System Institute (SAS1, 1989). Univariate analysis consisted in dierent models of analysis of variance (ANOVA). Two-way ANOVA (product, assessor) and associated tests of contrasts were carried out on the data of each panel to assess its discrimination ability. Three-way ANOVA was performed on the whole data set according to the following model
489
(Schlich, 1998): PANEL+ASSESSOR (PANEL)+ PRODUCT+PANEL*PRODUCT, where the panel, the assessor within panel eect and the interaction between product and panel were considered as random. This analysis allow the detection of the attributes that signi®cantly diered among samples and across the two panels. Three multidimensional techniques were also applied. Canonical variate analysis (CVA) was run, using the CANONICAL option of the MANOVA statement from the GLM procedure in SAS1 (1989), on the data of each panel to visualise on factorial maps how the samples were discriminated by each panel. Pairwise and generalized Procrustes analysis (GPA) (Gower, 1975) and STATIS (Lavit, 1988) were applied to determine and compare the number and the nature of the dimensions used by each assessor within a panel and by each panel to describe the product space. 3. Results and discussion 3.1. Agreement between panels and compromise: joint analysis 3.1.1. Agreement between panels on the product space Panel product spaces were ®rst compared performing a pairwise Procrustes analysis on the averaged data (across all the judges within a panel) from the two panels. The ®rst two dimensions explained respectively 73.5 and 11.5% of the total variance. Variance explained by further dimensions was very low (< 4%). Consequently, results and discussion were restricted to the ®rst factorial map. A separation of the samples in three clusters could be suggested on the sample space whatever the panel (Fig. 1A). Products 13, 15, 16, 17 and 18 were characterised by their strong fruity notes. Products 22, 24, 25, 26 and 27 were scored at a medium intensity level on these former sensory properties and exhibited a weak overall odour. The other products were related to sulphury and cheesy olfactory notes. The agreement between panels in terms of product distances was also con®rmed by the STATIS normalised RV (NRV) coecient between panels. The higher this coef®cient, the more similar the two panel con®gurations compared. The NRV coecient was of 15.2, a value much greater than 2 which is what can be obtained by chance (Schlich, 1996). The length of the bars joining the same samples assessed by the two panels gave some more information on the similarity and dierence in product evaluation. The greatest distances appeared for samples 15, 40, 3 and 31 but no big change in cluster membership of these samples could be evidenced. We conclude that these large values may be ascribed to quite large random dierences rather than to real disagreement between panels.
490
N. Martin et al. / Food Quality and Preference 11 (2000) 487±495
Fig. 1. Sample plot (A) and variable plot (B) of the GPA performed on the whole data set. (a) and (b) on the sample plot and (1) and (2) on the variable plot account for assessment by panel 1 and 2, respectively.
The Procrustes analysis of variance enabled us to investigate further dierences between panels. The three GPA transformations had a signi®cant eect (P< 0.001), illustrating that, on average, the panels used dierent levels and ranges of the intensity scale and also that some disagreement between panels existed on the meaning of the attributes (Risvik et al., 1992). The percentage of variation in the sums of squares related to the translation step (57%) was the highest which indicated
that the main dierences between panels concerned the level of the intensity scale used. Percentage of variation in the sums of squares related to the rotation and scaling steps were much lower (17 and 12% respectively). The correspondence between the terms used by the two panels was quite good (Fig. 1B). However, surimi and cabbage were better correlated with the ®rst two GP dimensions for panel 2. Correlation on the panel averaged data sets between the two cabbage attributes was
N. Martin et al. / Food Quality and Preference 11 (2000) 487±495
491
low (r=0.15) and between the two surimi attributes was higher (r=0.48). As illustrated in Fig. 1B, Munster (r=0.70), banana (r=0.88), solvent (r=0.94) were well correlated between panels. Although it does not appear clearly on Fig. 1B, carapace also bene®tted from one of the highest correlation coecients (r=0.50). The variance explained by GPC1 and GPC2 was higher for panel 1 data (73.6 and 14.2% compared to 65.3 and 13.6% for panel 2). The number of dimensions required by each panel to adequately describe the sample structure was also re¯ected by the calculation of the STATIS b coecient per panel. This b coecient enabled the comparison of dimensionalities of panel sample spaces. It was slightly higher for panel 2 (2.2) compared to panel 1 (1.8), suggesting that panel 2 assessment was slightly more complex than panel 1 assessment in terms of the number of attributes required to span the sample dierences (Schlich, 1996).
Table 2 Product F values of the three-way ANOVA (product, panel, assessor(panel)) performed on the whole data set (28 products, 2 panels)a
3.1.2. Agreement between panels on individual attributes F ratios of the three-way ANOVA (Table 2) allowed a deeper insight into panel disagreement concerning some attributes, which was not revealed by the GPA global approach. Assessor eect was very signi®cant (P<0.001) on all attributes and thus was not reported in Table 2. Assessor eect is commonly encountered in sensory analysis and can be explained by the inter-individual dierences in the use of the intensity scales. Panel eect was also signi®cant on ®ve attributes: garlic, cabbage, chocolate, banana and carapace. It was due to the dierent range of the scale used by the two panels to score these attributes as illustrated by the product means reported in Table 2. Signi®cant panel by product interactions also appeared on four attributes: intensity, cabbage, Munster and solvent. Although the mean values for the highly scored product on the solvent attribute were higher for panel 1, the trend of product means was similar for the two panels (Fig. 2A). This explains that, despite the high Fproductpanel value (6.71), the product eect (tested against the interaction term) remained signi®cant on this attribute (P<0.05). On the contrary, for intensity, Munster and cabbage (Fig. 2B, C and D), a dierent ranking of the product means was observed according to the panel. Interaction on the intensity and cabbage attributes led to a non signi®cant Fproduct-value. Despite the interaction, the Fproduct-value remained signi®cant on Munster but, unlike solvent, it must be considered cautiously. To summarise, only four attributes exhibited a signi®cant product by panel interaction. Despite these interactions, a signi®cant product eect remained on two attributes showing that little information on the product dierence was lost when the data of the two panels were considered. Besides, it can be noticed that three-way analysis of variance gave more precise information than the GPA approach which did not reveal the disagreement
observed, particularly on Munster and solvent attributes.
Attributes
Fproduct
Fpanel (mp1ÿmp2)b
Fproductpanel
Intensity Garlic Cabbage Chocolate Munster Banana Surimi Carapace Solvent Ammonia
1.64 1.02 1.25 0.58 3.48*** 15.07*** 2.84** 2.88** 3.22* 1.07
0.61 11.29** (0.3±1.3) 13.09** (0.3±1.9) 7.53* (0.6±1.6) 2.12 4.46* (0.9±2.3) 3.20 4.52* (0.6±1.5) 0.97 0.19
4.08*** 1.19 2.01** 1.02 2.65*** 0.56 0.97 1.10 6.71*** 1.37
a *, **, *** Indicate signi®cance at P 4 0.05, P 4 0.01, P 4 0.001, respectively. b Attribute means (mpi) are reported for panel 1 (mp1) and panel 2 (mp2) for signi®cant panel eect.
3.2. Panel comparison: separate analyses CVA was performed, per panel, on the whole set of attributes of each panel. The likelihood Ratio test evidenced two and one signi®cant axes for panel 1 and panel 2 analysis, respectively. Moreover, the multivariate analysis of variance on the canonical axis led to F-values of 4.04 and 1.74 for panel 1 and panel 2 data set, respectively. That evidenced the greater discrimination ability of panel 1 compared to panel 2. However, separation of the products into three clusters (Fig. 3A and B) was similar for both panels and comparable with the global GPA results (Fig. 1A). For both panels, the samples were discriminated along the ®rst dimension according to their fruity odours (banana, solvent and also apple for panel 1) and according to their sulphury notes (carapace, surimi and cabbage for both panels and Cancoillotte and foul for panel 1) and Munster odours. Dimension 2 was mainly determined by the overall odour intensity and the Munster note. Banana and solvent (as well as apple for panel 1) determinant in axis 1 variance were highly correlated to each other for each panel (r>0.78). Surimi and carapace were highly correlated for both panels (rp1=0.76, rp2=0.74). These two perceptions were probably concomitant in the samples. References used for surimi (dimethyl sul®de) and carapace (SK Pasta with strong shell®sh notes) were indeed very dierent and it is unlikely that the subjects confused the two related attributes. Correlation between sulphury and Munster notes diered between panels: garlic was best correlated with Munster for panel 1 (r=0.61) and with surimi (r=0.75) and cabbage (r=0.69) for panel 2; cabbage was also well correlated to carapace (r=0.76) and Munster (r=0.67) for panel 2.
492
N. Martin et al. / Food Quality and Preference 11 (2000) 487±495
Fig. 2. Interaction (panelproduct) plots on solvent (A), intensity (B), Munster (C) and cabbage (D) : panel 1 (&) and panel 2 (&). For each attribute, products are ranked according to panel 1 increasing means.
Consequently, discrepancies concerning the distances between some of the products (for example 3 and 31 as already mentioned for global GPA) may result from the dierent attribute correlation structure of the two panels. Two hypotheses can be proposed to explain this observation: (i) the perception of the sensory proximity between the olfactory notes diered between panels as suggested by the dierent correlation coecients; (ii) the attributes best used to dierentiate the samples diered between panels.
The second hypothesis was tested through a two-way analysis of variance (product, assessor) performed per panel and across the 28 products (Table 3). Cabbage, garlic, chocolate and blackcurrant were not signi®cant for panel 1 and, garlic, chocolate and ammonia were not signi®cant for panel 2. The indiscriminate use of garlic and chocolate by both panels supported the nonsigni®cant product eect previously observed in threeway ANOVA results. Both panels signi®cantly discriminated the samples on intensity, Munster, banana,
N. Martin et al. / Food Quality and Preference 11 (2000) 487±495
493
Fig. 3. Canonical variate analysis of panel 1 (A) and panel 2 (B) results.
surimi, carapace and solvent. Three of the four attributes used only by panel 1 (apple, Cancoillotte and foul) showed a high level of signi®cance (P<0.001). Ammonia was discriminant for panel 1 and cabbage for panel 2. The poor use of cabbage by panel 1 may be responsible for the dierences observed between panels in the correlation involving this attribute. The nonsigni®cant product eect noticed in three-way ANOVA on intensity did not result from a poor sample dis-
crimination by one of the panels on this attribute (P<0.001, Table 3), but by a signi®cant product by panel interaction. Contrast tests showed that the means of the same samples (13, 15, 16, 17 and 18) were signi®cantly higher than the product grand mean on the attributes banana and solvent for both panels and for apple for panel 1. This con®rmed the sample clustering suggested from global GPA and CVA factorial maps. Despite some discrepancies between dierent samples
494
N. Martin et al. / Food Quality and Preference 11 (2000) 487±495
Table 3 Product F values of the two-way ANOVA (product, assessor) performed per panel and across the 28 products Fproducta Attributes
Panel 1
Panel 2
Intensity Garlic Cabbage Chocolate Munster Banana Surimi Carapace Solvent Ammonia Blackcurrant Apple Cancoillotte Foul
7.63*** 1.52 1.05 0.94 8.46*** 4.03*** 1.86** 1.70* 24.00*** 1.59* 1.28 11.90*** 3.93*** 2.18***
1.84** 1.14 2.69*** 0.67 2.73*** 4.90*** 1.88** 2.58*** 2.64*** 1.09
a
*P 4 0.05, **P 4 0.01, ***P 4 0.001.
between panels, the main dierences between products appeared to be the same for both panels. 3.3. Consensus between the assessors within each panel STATIS was applied on the scores of each panel's whole set of attributes (Table 4). The scaling coecient between assessors varied more within panel 2 (0.19± 2.37) than within panel 1 (0.50±1.67) revealing a greater dierence in the range of the scale used between panel 2 assessors. The homogeneity in each panel was estimated by calculating a NRV coecient between each pair of assessors within a panel. The NRV coecient between pairs of assessors were averaged to give for each assessor within each panel a meanNRV. MeanNRVs were higher for panel 1 assessors (mean=9.93) compared with panel 2 assessors (mean=3.95). Agreement among assessors in terms of product sensory distance was much Table 4 Summary of assessors statistics in STATIS analysis performed per panel Panel 1
Panel 2
Assessor Scaling MeanNRV b
Assessor Scaling MeanNRV b
1 2 3 4 5 6 7 8 9 10
1.05 0.90 1.09 1.29 0.87 0.52 1.56 0.57 1.67 0.50
10.66 7.85 10.18 11.06 10.13 9.39 11.22 6.19 10.83 11.84
11 12 13 14 15 16 17 18 19 20
Mean
1.00
9.93
5.38 2.88 4.78 2.17 3.03 4.48 2.43 3.55 1.67 2.90
3.33 Mean
0.84 0.57 1.16 2.37 0.35 0.70 1.97 1.27 0.19 0.58
3.69 3.21 3.85 4.95 1.44 3.64 3.91 6.64 3.61 4.58
3.45 4.00 4.82 4.03 5.83 3.84 3.47 2.46 2.81 3.01
1.00
3.95
3.77
better within panel 1. Discrepancies between assessors within a panel were noticed: the range of meanNRVs was (6.19±11.84) for panel 1 and (1.44±6.64) for panel 2. Except for one panel 2 assessor, all meanNRV values were greater than 2. The number of dimensions required by each individual to adequately describe the sample structure was re¯ected by the b coecient whose mean was slightly higher for panel 2 (meanp2=3.77) compared to panel 1 (meanp1=3.33), denoting, as already mentioned, that as a whole, panel 2 required more underlying sensory dimensions to describe the dierences between samples. Values of former b coecients were higher than the one issued from the global analysis on the averaged data sets from the two panels (across all the judges within a panel): 1.8 (panel 1) and 2.2 (panel 2). Dierences in dimensionality were probably due to the strong agreement between judges on the ®rst dimension and to the lesser agreement on the other dimensions. Calculation of b coecients from the average data sets emphasised the importance of the ®rst dimension. Panel 2 coecients were generally higher than panel 1 coecients. The b coecients ranged from 1.67 to 5.38 in panel 1 and from 2.46 to 5.83 in panel 2 which also suggested dierences between assessors in terms of the sample space dimensionality. STATIS indices such as scaling, MeanNRVs and b coecients and summary tables appeared to be good tools to quickly obtain information on panel dierences. 4. Conclusion This study indicated that quantitative descriptive analysis of a large set of products by two panels trained independently but according to the same procedure were similar for the main sensory dierences between samples. These results are particularly encouraging since, it was possible that the laboratory scale preparation of the samples might have induced small dierences between the two laboratories. In addition, the sensory assessment was restricted to the odour of the samples, a perception generally reported as dicult to evaluate by the assessors and showing great inter-individual variations. However, dierences appear between panels concerning the importance attached to each attribute to discriminate the samples. This may be due to variation with previous sensory experience of each panellist. Differences may also be encountered between assessors within a panel and can lead to discrepancies between panels even when only a few individual panellists weight some of the attributes in a dierent way. Very similarly, variability in the use of the scoring scale can appear between panels as well as between assessors within a panel. However, these dierences can be partly recti®ed by the statistical treatment applied. Dierences between
N. Martin et al. / Food Quality and Preference 11 (2000) 487±495
panels can be regarded as analogous to dierences between individual assessors in a single panel. Consequently, improvement of inter-panel consensus should be achieved by introducing procedures similar to those applied to homogenise assessor judgements within a panel. The use of a standardised terminology including the same references for the same descriptors does not seem to be sucient. Performance monitoring should focus more on the emphasis attributed by each panellist to each attribute, even when the attributes are the same. This could avoid the imbalance in the use of some attributes that was encountered in the present study and that was partly responsible for discrepancies between panels. Close collaboration between panel leaders is, therefore, necessary. Regular feed-back of the results of each laboratory to the other laboratories could ensure early detection of such drift and be used to guide training. The univariate and multivariate methods reported here are useful tools to compare individual assessor and panel results. However, conclusions resulting from these dierent analyses may dier slightly. For this reason, it would be advisable not to con®ne interpretation to one speci®c statistical analysis procedure alone. Enlarging the scope of analysis and comparing the subsequent results provides a way of con®rming speci®c conclusions and accounting for the range of dierences inherent in such studies. Acknowledgements The authors would like to thank Sandra Savonitto for her assistance in collecting the data used for this study. This work was ®nanced by contract FAIR PL 96 1196 from the European Union (Brussels, Belgium). References Apparicio, R., Gutierrez, F., & Rodriguez, J. (1991). A chemometrics study of analytical panels in virgin olive oil. An approach for evaluating panels in training. Grasas y Aceites, 42(3), 202±210. Burke, S., Spooner, M. J. R., & Hegarty, P. K. (1997). Sensory testing of beers: an inter-laboratory sensory trial. Journal of the Institute of Brewing, 103, 15±19. Cardello, A. V., Maller, O., Kapsalis, J. G., Segars, R. A., Sawyer, F. M., Murphy, C., & Moskowitz, H. R. (1982). Perception of texture by trained and consumer panelists. Journal of Food Science, 47, 1186±1197. Claassen, M., & Lawless, H. (1992). Comparison of descriptive terminology systems for sensory evaluation of ¯uid milk. Journal of Food Science, 57(3), 596±600. Clapperton, J. F., & Piggott, J. R. (1979). Flavour characterisation by
495
trained and untrained assessors. Journal of the Institute of Brewing, 85, 275±277. Daget, N., & Collyer, S. (1984). Comparison between quantitative descriptive analysis and physical measurements of gel systems and evaluation of the sensorial method. Journal of Texture Studies, 15, 227±245. Drans®eld, E., Rhodes, D. N., Nute, G. R., Roberts, T. A., Boccard, R., Touraille, C., Buchter, L., Hood, D. E., Joseph, R. L., Schon, I., Casteels, M., Cosentino, E., & Tibergen, B. J. (1982). Eating quality of European beef assessed at ®ve research institute. Meat Science, 6, 163±184. The European Sensory Network (1996). A European sensory and consumer study: a case study on coee. Chipping Campden, Gloucestershire: Campden & Chorleywood Food Research Association. Gower, J. C. (1975). Generalized Procrustes analysis. Psychometrika, 40, 33±51. Guerrero, L., Gou, P., & Arnau, J. (1997). Descriptive analysis of toasted almonds: a comparison between expert and semi-trained assessors. Journal of Sensory Studies, 12, 39±54. Heymann, H. (1994). A comparison of descriptive analysis of vanilla by two independently trained panels. Journal of Sensory Studies, 9, 21±32. Hirst, D., & Muir, D. (1994). De®nition of the sensory properties of hard cheese : a collaborative study between Scottish and Norwegian panels. International Dairy Journal, 4, 743±761. Hunter, E. A., Muir, D. D., & Brennan, R. M. (1995). Comparison of the performance of an external sensory panel with an internal panel. In: 4eÁmes JourneÂes Agro-industrie et MeÂthodes statistiques, 7±8 December 1995, Dijon, France. Hunter, E. A., & McEwan, J. (1998). Evaluation of an international ring trial for sensory pro®ling of hard cheese. Food Quality and Preference, 9(5), 343±354. Lavit, C. (1988). Analyse Conjointe de Tableaux Quantitatifs. Paris: Masson. Martin, N., Savonitto, S., Molimard, P., Berger, M., Brousse, M., & Spinnler, H. E. (1999). Flavor generation in cheese curd by coculturing with selected yeast, mold and bacteria. Journal of Dairy Science, 82, 1072±1080. Medeiros, L. C., Field, R. A., Menkaus, D. J., & Russell, W. C. (1987). Evaluation of range-grazed and concentrate-fed beef by a trained sensory panel, a household panel and a laboratory test market group . Journal of Sensory Studies, 2, 259±272. Nielsen, R. G., & Zannoni, M. (1998). Progress in European sensory methodology for evaluation of cheese. International Journal of Dairy Technology, 51, 57±64. Risvik, E., Colwill, J. S., McEwan, J., & Lyon, D. H. (1992). Multivariate analysis of conventional pro®ling data: a comparison of a British and a Norwegian panel. Journal of Sensory Studies, 7, 97± 118. Roberts, A. K., & Vickers, Z. M. (1994). A comparison of trained and untrained judges. Evaluation of sensory attribute intensities and liking of cheddar cheeses. Journal of Sensory Studies, 9, 1±20. Statistical Analysis System Institute Inc. (1989). PC-SAS Version 6.12. Cary, NC: author. Schlich, P. (1996). De®ning and validating assessor comprimises about product distances and attribute correlations. In T. Noes, & E. Risvik, Multivariate analysis of data in sensory science (pp. 259±306). Amsterdam: Elsevier Science. Schlich, P. (1998). What are the sensory dierences among coees Multi-panel analysis of variance and ¯ash analysis? Food Quality and Preference, 9(3), 103±106.