Comparison of odour sensory profiles performed by two independent trained panels following the same descriptive analysis procedures

Comparison of odour sensory profiles performed by two independent trained panels following the same descriptive analysis procedures

Food Quality and Preference 11 (2000) 487±495 www.elsevier.com/locate/foodqual Comparison of odour sensory pro®les performed by two independent trai...

352KB Sizes 0 Downloads 102 Views

Food Quality and Preference 11 (2000) 487±495

www.elsevier.com/locate/foodqual

Comparison of odour sensory pro®les performed by two independent trained panels following the same descriptive analysis procedures Nathalie Martin a,*, Pascal Molimard b, Henry Eric Spinnler a, Pascal Schlich c a

DeÂpartement des Sciences et Industries Alimentaires et Biologiques, Institut National Agronomique Paris-Grignon, 78850 Thiverval-Grignon, France b SKW Biosystems, Direction Cultures et Enzymes, Institut National de la Recherche Agronomique, Laboratoire de Recherches sur les AroÃmes, 17 rue de Sully, B.P. 1540, 21034 Dijon Cedex, France c Institut National de la Recherche Agronomique, Laboratoire de Recherches sur les AroÃmes, 17 rue de Sully, B.P. 1540, 21034 Dijon Cedex, France Received 15 March 2000; received in revised form 20 June 2000; accepted 3 July 2000

Abstract Odour sensory pro®ling of 28 associations of cheese ripening micro-organisms was performed by two panels of 10 assessors on two di€erent sites. Sample preparation, training protocols and references, tasting procedures and scoring were similar in the two laboratories. Panel 2 used 10 attributes and panel 1 used these terms plus 4 extra descriptors. Analysis of variance and multivariate methods (canonical variate analysis, generalised procrustes analysis and STATIS) exhibited di€erences between assessors within a panel and between panels concerning the use of the scoring scale and the strength of product discrimination by attribute. Panel 1 was more sensitive to fruity notes and panel 2 to sulphury odours. However, a good overlap in the separate and pooled analyses suggested the same sample clustering in three main groups and showed that the 2 panels gave consistent results. # 2000 Elsevier Science Ltd. All rights reserved. Keywords: Odour pro®ling; Descriptive analysis; Panel comparison; Procrustes; ANOVA; STATIS

1. Introduction Standardising sensory procedures in order to obtain consistent results has long been, and still is, a major preoccupation of many laboratories dealing with sensory analysis. In addition, growing international trade and the subsequent production of the same product at many di€erent sites increase the need for better standardisation of quality control. Several inter-laboratory sensory trials have been reported. Most of them concern sensory pro®ling performed on various foods (cheese, beef, ®sh, almonds, olive oil) or beverages (beer, chocolate, co€ee, milk). The procedures and objectives of these studies show two main di€erences: the level of training of the panels and the sensory methodology applied. Most often, the experience of the assessors belonging to each panel is di€erent in length (Apparicio,

* Corresponding author. Tel.: +33-1-30-81-54-72; fax.: +33-1-3081-55-97. E-mail address: [email protected] (N. Martin).

Gutierrez & Rodriguez, 1991; Daget & Collyer, 1984; Guerrero, Gou & Arnau, 1997; Hirst & Muir, 1994) or nature (Cardello et al., 1982; Clapperton & Piggott, 1979; Medeiros, Field, Menkaus & Russell, 1987; Nielsen & Zannoni, 1998; Roberts & Vickers, 1994). In the few studies mentioning panel training, the same basic sensory procedures were used (Apparicio et al., 1991; Daget & Collyer, 1984; Heymann, 1994; Hunter & McEwan, 1998). The panels either designed their own vocabulary (Claassen & Lawless, 1992; Guerrero et al., 1997; Heymann; Hirst & Muir, 1994; Hunter, Muir & Brennan, 1995; Risvik, Colwill, McEwan & Lyon, 1992; Roberts & Vickers, 1994; The European Sensory Network [ESN], 1996) or used a common list of attributes (Apparicio et al., 1991; Burke, Spooner & Hegarty, 1997; Cardello et al., 1982; Clapperton & Piggott, 1979; Daget & Collyer, 1984; Medeiros et al., 1987). Sensory procedures can also di€er according to the way the products are prepared (Drans®eld et al., 1982) or the scale used (Heymann, 1994; Hirst & Muir, 1994; Hunter & McEwan, 1998; Nielsen & Zannoni, 1998; Roberts & Vickers, 1994).

0950-3293/00/$ - see front matter # 2000 Elsevier Science Ltd. All rights reserved. PII: S0950-3293(00)00021-5

488

N. Martin et al. / Food Quality and Preference 11 (2000) 487±495

When panels employ the same terminology, results generally show a good inter-panel agreement. Comparing pro®les of trained panellists and consumers, Cardello et al. (1982), Clapperton and Piggott (1979) and Medeiros et al. (1987) observed good correlation between panels although a broader range of perceptions was evidenced and recognised by trained panellists. Studies including panellists with the same level of training also report comparable results between panels regarding the structure of the sample spaces i.e. the relative similarities and di€erences between products. However, the interpretation of the underlying sensory dimensions responsible for the perceived di€erences may di€er between panels due to di€erences in each panellist's understanding and use of certain attributes (Apparicio et al., 1991; Daget & Collyer, 1984; Hunter & McEwan, 1998; Risvik et al., 1992). Burke, Spooner and Hegarty (1997) emphasised the importance of the understanding of attributes in their discussion of how 7 panels used the same descriptors to describe di€erent ¯avour characteristics of beer. These authors, whose panels were not speci®cally trained, also recommend a standardised training to improve inter-panel consensus, similar to training designed to homogenise panellists' judgements within a single panel. The objective of this study was to determine whether the odour pro®les obtained by two independent panels trained in a similar way, using the same vocabulary and proceeding according to the same descriptive quantitative analysis procedures would be comparable for a quite large set of products resulting from biological transformation (fermentation) and prepared in two different laboratories. Unlike earlier reports, this study dealt with non-commercial products produced in a small series and independently, in two laboratories. It is well known that small scale production is generally less reproducible than industrial production. Moreover, sensory evaluation was restricted to odour perceptions, generally considered to be the most dicult to assess. This paper provides an additional experience in the area of scienti®c and practical interest concerning the comparison of panel performance. It also presents, compares and discusses the eciency of di€erent statistical tools that can be used for a relevant comparison in such a context. 2. Materials and methods 2.1. Samples The samples studied were microbial associations of three di€erent yeasts, three di€erent strains of Geotrichum candidum and ®ve di€erent bacteria, commonly used in bacteria and mould surface ripened cheese. All samples were cultured on a model lactic curd. Thirty-

nine samples were thus prepared and evaluated (Martin, Savonitto, Molimard, Berger, Brousse & Spinnler, 1999). The cultures were stored in darkness at 12 C for 21 days. After incubation, the cultured curd was blended with Milli-Q1 water (2:1, wt/wt) at 20,500 rpm for 40 s (Ultra-Turrax1 model T25 equipped with an S25N18G grinding probe, IKA Labortechnick, Staufen, Germany). For sensory analysis, the solution was distributed into 60-ml coded brown ¯asks, 25 ml per ¯ask. The ¯asks were stored overnight at 4 C and then held at 20 C for about 3 h before sensory evaluation. The samples were prepared independently in the two di€erent laboratories and tested independently in each of the laboratories. 2.2. Panels Two panels of 10 trained assessors each, participated in the study on two di€erent locations (Laboratoire de GeÂnie et Microbiologie des ProceÂdeÂs Alimentaires, Institut National de la Recherche Agronomique, Thiverval-Grignon, France and Laboratoire de Recherches sur les AroÃmes, Institut National de la Recherche Agronomique, Dijon, France). Ten graduate students (4 males, 6 females), performing a training course or PhD work in the laboratory, participated in panel 1. None had participated in a sensory pro®ling of cheese but half of them had been previously involved in sensory assessment of fresh dairy products. Panel 2 was composed of 10 assessors (6 males, 4 females) selected from the laboratory sta€ for their experience in ¯avour analysis. All of them had been previously involved in sensory assessment of various food products. Both panels were speci®cally trained to use the sensory procedures and to describe the samples under study. 2.3. Sensory procedures Training and pro®ling procedures were the same for the two panels. The judges were given a list of 10 common attributes de®ned from former studies on a similar set of products as describing the main olfactory di€erences between samples: intensity (overall odour intensity), garlic, cabbage, chocolate, Munster, banana, surimi, carapace, solvent, ammonia. Eight 45-min sessions were dedicated to training the assessors to properly use the vocabulary and the rating scale. During this step, panel 1 reached a consensus and chose to use four more descriptors: apple, blackcurrant, Cancoillotte and foul. These extra attributes were included in panel 1 list only. References de®ned for each attribute and used for training are reported in Table 1. Each panel practised using the unstructured intensity scale by assessing different samples of the experimental design. Odour pro®ling was performed in a monadic way: the samples were smelt one after the other. Five samples

N. Martin et al. / Food Quality and Preference 11 (2000) 487±495 Table 1 Odor attributes and references used for the descriptive analyses Attribute

Reference standard

Odor intensity Banana Solvent Cabbage Garlic Chocolate Munster Surimi Carapace Ammonia Blackcurrant Apple

Odor overall intensity Isoamyl acetatea Ethyl acetatea Dimethyldisul®dea Allyl disul®dea Ovomaltineb Methylthiobutyratea Dimethylsul®dea SK pastac Ammoniad Fresh crushed blackcurrant Grany Smith apple ¯avoured yoghurte Cancoillotte cheese Landelf 1,5- diaminopentanea

Cancoillotte Foul a b c d e f

Concentration 15 ppb 5 ppm 200 ppb 150 ppb 150 ppm 100 ppb 20 ppb 150 ppm 200 ppm

100 ppm

Aldrich (Saint Quentin Fallavier, France). Wander (Annonay, France). Isnard-Lyraz (Fresnes, France). Prolabo (Fontenay Sous-Bois, France). Danone (Levallois-Perret, France). Marcillat (Corcieux, France).

were presented per session for both panels. However, the sample presentation design (Latin square) di€ered between the two panels. Within panels, the order of presentation of the samples within a session was balanced to avoid carry-over and order e€ects. Sensory evaluations were conducted in air-conditioned rooms (20 C), under white light in separate booths. Scores were recorded directly on the same computer system using FIZZ software (BIOSYSTEMES1, Couternon, France). 2.4. Data analysis Checking for microbial contamination led to the discarding of 11 of the 39 samples tested from the data analysis. Five and four associations were contaminated for laboratory 1 and 2 respectively. Two associations showed contamination in both laboratories. Consequently, the data set used to compare the performance of the two panels included 28 common samples. Global analyses, including the whole data set, were ®rst carried out to compare the results from the two laboratories. Then the data from each panel were analysed separately in order to assess each assessor and panel performance. All the analyses were performed on Statistical Analysis System Institute (SAS1, 1989). Univariate analysis consisted in di€erent models of analysis of variance (ANOVA). Two-way ANOVA (product, assessor) and associated tests of contrasts were carried out on the data of each panel to assess its discrimination ability. Three-way ANOVA was performed on the whole data set according to the following model

489

(Schlich, 1998): PANEL+ASSESSOR (PANEL)+ PRODUCT+PANEL*PRODUCT, where the panel, the assessor within panel e€ect and the interaction between product and panel were considered as random. This analysis allow the detection of the attributes that signi®cantly di€ered among samples and across the two panels. Three multidimensional techniques were also applied. Canonical variate analysis (CVA) was run, using the CANONICAL option of the MANOVA statement from the GLM procedure in SAS1 (1989), on the data of each panel to visualise on factorial maps how the samples were discriminated by each panel. Pairwise and generalized Procrustes analysis (GPA) (Gower, 1975) and STATIS (Lavit, 1988) were applied to determine and compare the number and the nature of the dimensions used by each assessor within a panel and by each panel to describe the product space. 3. Results and discussion 3.1. Agreement between panels and compromise: joint analysis 3.1.1. Agreement between panels on the product space Panel product spaces were ®rst compared performing a pairwise Procrustes analysis on the averaged data (across all the judges within a panel) from the two panels. The ®rst two dimensions explained respectively 73.5 and 11.5% of the total variance. Variance explained by further dimensions was very low (< 4%). Consequently, results and discussion were restricted to the ®rst factorial map. A separation of the samples in three clusters could be suggested on the sample space whatever the panel (Fig. 1A). Products 13, 15, 16, 17 and 18 were characterised by their strong fruity notes. Products 22, 24, 25, 26 and 27 were scored at a medium intensity level on these former sensory properties and exhibited a weak overall odour. The other products were related to sulphury and cheesy olfactory notes. The agreement between panels in terms of product distances was also con®rmed by the STATIS normalised RV (NRV) coecient between panels. The higher this coef®cient, the more similar the two panel con®gurations compared. The NRV coecient was of 15.2, a value much greater than 2 which is what can be obtained by chance (Schlich, 1996). The length of the bars joining the same samples assessed by the two panels gave some more information on the similarity and di€erence in product evaluation. The greatest distances appeared for samples 15, 40, 3 and 31 but no big change in cluster membership of these samples could be evidenced. We conclude that these large values may be ascribed to quite large random di€erences rather than to real disagreement between panels.

490

N. Martin et al. / Food Quality and Preference 11 (2000) 487±495

Fig. 1. Sample plot (A) and variable plot (B) of the GPA performed on the whole data set. (a) and (b) on the sample plot and (1) and (2) on the variable plot account for assessment by panel 1 and 2, respectively.

The Procrustes analysis of variance enabled us to investigate further di€erences between panels. The three GPA transformations had a signi®cant e€ect (P< 0.001), illustrating that, on average, the panels used di€erent levels and ranges of the intensity scale and also that some disagreement between panels existed on the meaning of the attributes (Risvik et al., 1992). The percentage of variation in the sums of squares related to the translation step (57%) was the highest which indicated

that the main di€erences between panels concerned the level of the intensity scale used. Percentage of variation in the sums of squares related to the rotation and scaling steps were much lower (17 and 12% respectively). The correspondence between the terms used by the two panels was quite good (Fig. 1B). However, surimi and cabbage were better correlated with the ®rst two GP dimensions for panel 2. Correlation on the panel averaged data sets between the two cabbage attributes was

N. Martin et al. / Food Quality and Preference 11 (2000) 487±495

491

low (r=0.15) and between the two surimi attributes was higher (r=0.48). As illustrated in Fig. 1B, Munster (r=0.70), banana (r=0.88), solvent (r=0.94) were well correlated between panels. Although it does not appear clearly on Fig. 1B, carapace also bene®tted from one of the highest correlation coecients (r=0.50). The variance explained by GPC1 and GPC2 was higher for panel 1 data (73.6 and 14.2% compared to 65.3 and 13.6% for panel 2). The number of dimensions required by each panel to adequately describe the sample structure was also re¯ected by the calculation of the STATIS b coecient per panel. This b coecient enabled the comparison of dimensionalities of panel sample spaces. It was slightly higher for panel 2 (2.2) compared to panel 1 (1.8), suggesting that panel 2 assessment was slightly more complex than panel 1 assessment in terms of the number of attributes required to span the sample di€erences (Schlich, 1996).

Table 2 Product F values of the three-way ANOVA (product, panel, assessor(panel)) performed on the whole data set (28 products, 2 panels)a

3.1.2. Agreement between panels on individual attributes F ratios of the three-way ANOVA (Table 2) allowed a deeper insight into panel disagreement concerning some attributes, which was not revealed by the GPA global approach. Assessor e€ect was very signi®cant (P<0.001) on all attributes and thus was not reported in Table 2. Assessor e€ect is commonly encountered in sensory analysis and can be explained by the inter-individual di€erences in the use of the intensity scales. Panel e€ect was also signi®cant on ®ve attributes: garlic, cabbage, chocolate, banana and carapace. It was due to the di€erent range of the scale used by the two panels to score these attributes as illustrated by the product means reported in Table 2. Signi®cant panel by product interactions also appeared on four attributes: intensity, cabbage, Munster and solvent. Although the mean values for the highly scored product on the solvent attribute were higher for panel 1, the trend of product means was similar for the two panels (Fig. 2A). This explains that, despite the high Fproductpanel value (6.71), the product e€ect (tested against the interaction term) remained signi®cant on this attribute (P<0.05). On the contrary, for intensity, Munster and cabbage (Fig. 2B, C and D), a di€erent ranking of the product means was observed according to the panel. Interaction on the intensity and cabbage attributes led to a non signi®cant Fproduct-value. Despite the interaction, the Fproduct-value remained signi®cant on Munster but, unlike solvent, it must be considered cautiously. To summarise, only four attributes exhibited a signi®cant product by panel interaction. Despite these interactions, a signi®cant product e€ect remained on two attributes showing that little information on the product di€erence was lost when the data of the two panels were considered. Besides, it can be noticed that three-way analysis of variance gave more precise information than the GPA approach which did not reveal the disagreement

observed, particularly on Munster and solvent attributes.

Attributes

Fproduct

Fpanel (mp1ÿmp2)b

Fproductpanel

Intensity Garlic Cabbage Chocolate Munster Banana Surimi Carapace Solvent Ammonia

1.64 1.02 1.25 0.58 3.48*** 15.07*** 2.84** 2.88** 3.22* 1.07

0.61 11.29** (0.3±1.3) 13.09** (0.3±1.9) 7.53* (0.6±1.6) 2.12 4.46* (0.9±2.3) 3.20 4.52* (0.6±1.5) 0.97 0.19

4.08*** 1.19 2.01** 1.02 2.65*** 0.56 0.97 1.10 6.71*** 1.37

a *, **, *** Indicate signi®cance at P 4 0.05, P 4 0.01, P 4 0.001, respectively. b Attribute means (mpi) are reported for panel 1 (mp1) and panel 2 (mp2) for signi®cant panel e€ect.

3.2. Panel comparison: separate analyses CVA was performed, per panel, on the whole set of attributes of each panel. The likelihood Ratio test evidenced two and one signi®cant axes for panel 1 and panel 2 analysis, respectively. Moreover, the multivariate analysis of variance on the canonical axis led to F-values of 4.04 and 1.74 for panel 1 and panel 2 data set, respectively. That evidenced the greater discrimination ability of panel 1 compared to panel 2. However, separation of the products into three clusters (Fig. 3A and B) was similar for both panels and comparable with the global GPA results (Fig. 1A). For both panels, the samples were discriminated along the ®rst dimension according to their fruity odours (banana, solvent and also apple for panel 1) and according to their sulphury notes (carapace, surimi and cabbage for both panels and Cancoillotte and foul for panel 1) and Munster odours. Dimension 2 was mainly determined by the overall odour intensity and the Munster note. Banana and solvent (as well as apple for panel 1) determinant in axis 1 variance were highly correlated to each other for each panel (r>0.78). Surimi and carapace were highly correlated for both panels (rp1=0.76, rp2=0.74). These two perceptions were probably concomitant in the samples. References used for surimi (dimethyl sul®de) and carapace (SK Pasta with strong shell®sh notes) were indeed very di€erent and it is unlikely that the subjects confused the two related attributes. Correlation between sulphury and Munster notes di€ered between panels: garlic was best correlated with Munster for panel 1 (r=0.61) and with surimi (r=0.75) and cabbage (r=0.69) for panel 2; cabbage was also well correlated to carapace (r=0.76) and Munster (r=0.67) for panel 2.

492

N. Martin et al. / Food Quality and Preference 11 (2000) 487±495

Fig. 2. Interaction (panelproduct) plots on solvent (A), intensity (B), Munster (C) and cabbage (D) : panel 1 (&) and panel 2 (&). For each attribute, products are ranked according to panel 1 increasing means.

Consequently, discrepancies concerning the distances between some of the products (for example 3 and 31 as already mentioned for global GPA) may result from the di€erent attribute correlation structure of the two panels. Two hypotheses can be proposed to explain this observation: (i) the perception of the sensory proximity between the olfactory notes di€ered between panels as suggested by the di€erent correlation coecients; (ii) the attributes best used to di€erentiate the samples di€ered between panels.

The second hypothesis was tested through a two-way analysis of variance (product, assessor) performed per panel and across the 28 products (Table 3). Cabbage, garlic, chocolate and blackcurrant were not signi®cant for panel 1 and, garlic, chocolate and ammonia were not signi®cant for panel 2. The indiscriminate use of garlic and chocolate by both panels supported the nonsigni®cant product e€ect previously observed in threeway ANOVA results. Both panels signi®cantly discriminated the samples on intensity, Munster, banana,

N. Martin et al. / Food Quality and Preference 11 (2000) 487±495

493

Fig. 3. Canonical variate analysis of panel 1 (A) and panel 2 (B) results.

surimi, carapace and solvent. Three of the four attributes used only by panel 1 (apple, Cancoillotte and foul) showed a high level of signi®cance (P<0.001). Ammonia was discriminant for panel 1 and cabbage for panel 2. The poor use of cabbage by panel 1 may be responsible for the di€erences observed between panels in the correlation involving this attribute. The nonsigni®cant product e€ect noticed in three-way ANOVA on intensity did not result from a poor sample dis-

crimination by one of the panels on this attribute (P<0.001, Table 3), but by a signi®cant product by panel interaction. Contrast tests showed that the means of the same samples (13, 15, 16, 17 and 18) were signi®cantly higher than the product grand mean on the attributes banana and solvent for both panels and for apple for panel 1. This con®rmed the sample clustering suggested from global GPA and CVA factorial maps. Despite some discrepancies between di€erent samples

494

N. Martin et al. / Food Quality and Preference 11 (2000) 487±495

Table 3 Product F values of the two-way ANOVA (product, assessor) performed per panel and across the 28 products Fproducta Attributes

Panel 1

Panel 2

Intensity Garlic Cabbage Chocolate Munster Banana Surimi Carapace Solvent Ammonia Blackcurrant Apple Cancoillotte Foul

7.63*** 1.52 1.05 0.94 8.46*** 4.03*** 1.86** 1.70* 24.00*** 1.59* 1.28 11.90*** 3.93*** 2.18***

1.84** 1.14 2.69*** 0.67 2.73*** 4.90*** 1.88** 2.58*** 2.64*** 1.09

a

*P 4 0.05, **P 4 0.01, ***P 4 0.001.

between panels, the main di€erences between products appeared to be the same for both panels. 3.3. Consensus between the assessors within each panel STATIS was applied on the scores of each panel's whole set of attributes (Table 4). The scaling coecient between assessors varied more within panel 2 (0.19± 2.37) than within panel 1 (0.50±1.67) revealing a greater di€erence in the range of the scale used between panel 2 assessors. The homogeneity in each panel was estimated by calculating a NRV coecient between each pair of assessors within a panel. The NRV coecient between pairs of assessors were averaged to give for each assessor within each panel a meanNRV. MeanNRVs were higher for panel 1 assessors (mean=9.93) compared with panel 2 assessors (mean=3.95). Agreement among assessors in terms of product sensory distance was much Table 4 Summary of assessors statistics in STATIS analysis performed per panel Panel 1

Panel 2

Assessor Scaling MeanNRV b

Assessor Scaling MeanNRV b

1 2 3 4 5 6 7 8 9 10

1.05 0.90 1.09 1.29 0.87 0.52 1.56 0.57 1.67 0.50

10.66 7.85 10.18 11.06 10.13 9.39 11.22 6.19 10.83 11.84

11 12 13 14 15 16 17 18 19 20

Mean

1.00

9.93

5.38 2.88 4.78 2.17 3.03 4.48 2.43 3.55 1.67 2.90

3.33 Mean

0.84 0.57 1.16 2.37 0.35 0.70 1.97 1.27 0.19 0.58

3.69 3.21 3.85 4.95 1.44 3.64 3.91 6.64 3.61 4.58

3.45 4.00 4.82 4.03 5.83 3.84 3.47 2.46 2.81 3.01

1.00

3.95

3.77

better within panel 1. Discrepancies between assessors within a panel were noticed: the range of meanNRVs was (6.19±11.84) for panel 1 and (1.44±6.64) for panel 2. Except for one panel 2 assessor, all meanNRV values were greater than 2. The number of dimensions required by each individual to adequately describe the sample structure was re¯ected by the b coecient whose mean was slightly higher for panel 2 (meanp2=3.77) compared to panel 1 (meanp1=3.33), denoting, as already mentioned, that as a whole, panel 2 required more underlying sensory dimensions to describe the di€erences between samples. Values of former b coecients were higher than the one issued from the global analysis on the averaged data sets from the two panels (across all the judges within a panel): 1.8 (panel 1) and 2.2 (panel 2). Di€erences in dimensionality were probably due to the strong agreement between judges on the ®rst dimension and to the lesser agreement on the other dimensions. Calculation of b coecients from the average data sets emphasised the importance of the ®rst dimension. Panel 2 coecients were generally higher than panel 1 coecients. The b coecients ranged from 1.67 to 5.38 in panel 1 and from 2.46 to 5.83 in panel 2 which also suggested di€erences between assessors in terms of the sample space dimensionality. STATIS indices such as scaling, MeanNRVs and b coecients and summary tables appeared to be good tools to quickly obtain information on panel di€erences. 4. Conclusion This study indicated that quantitative descriptive analysis of a large set of products by two panels trained independently but according to the same procedure were similar for the main sensory di€erences between samples. These results are particularly encouraging since, it was possible that the laboratory scale preparation of the samples might have induced small di€erences between the two laboratories. In addition, the sensory assessment was restricted to the odour of the samples, a perception generally reported as dicult to evaluate by the assessors and showing great inter-individual variations. However, di€erences appear between panels concerning the importance attached to each attribute to discriminate the samples. This may be due to variation with previous sensory experience of each panellist. Differences may also be encountered between assessors within a panel and can lead to discrepancies between panels even when only a few individual panellists weight some of the attributes in a di€erent way. Very similarly, variability in the use of the scoring scale can appear between panels as well as between assessors within a panel. However, these di€erences can be partly recti®ed by the statistical treatment applied. Di€erences between

N. Martin et al. / Food Quality and Preference 11 (2000) 487±495

panels can be regarded as analogous to di€erences between individual assessors in a single panel. Consequently, improvement of inter-panel consensus should be achieved by introducing procedures similar to those applied to homogenise assessor judgements within a panel. The use of a standardised terminology including the same references for the same descriptors does not seem to be sucient. Performance monitoring should focus more on the emphasis attributed by each panellist to each attribute, even when the attributes are the same. This could avoid the imbalance in the use of some attributes that was encountered in the present study and that was partly responsible for discrepancies between panels. Close collaboration between panel leaders is, therefore, necessary. Regular feed-back of the results of each laboratory to the other laboratories could ensure early detection of such drift and be used to guide training. The univariate and multivariate methods reported here are useful tools to compare individual assessor and panel results. However, conclusions resulting from these di€erent analyses may di€er slightly. For this reason, it would be advisable not to con®ne interpretation to one speci®c statistical analysis procedure alone. Enlarging the scope of analysis and comparing the subsequent results provides a way of con®rming speci®c conclusions and accounting for the range of di€erences inherent in such studies. Acknowledgements The authors would like to thank Sandra Savonitto for her assistance in collecting the data used for this study. This work was ®nanced by contract FAIR PL 96 1196 from the European Union (Brussels, Belgium). References Apparicio, R., Gutierrez, F., & Rodriguez, J. (1991). A chemometrics study of analytical panels in virgin olive oil. An approach for evaluating panels in training. Grasas y Aceites, 42(3), 202±210. Burke, S., Spooner, M. J. R., & Hegarty, P. K. (1997). Sensory testing of beers: an inter-laboratory sensory trial. Journal of the Institute of Brewing, 103, 15±19. Cardello, A. V., Maller, O., Kapsalis, J. G., Segars, R. A., Sawyer, F. M., Murphy, C., & Moskowitz, H. R. (1982). Perception of texture by trained and consumer panelists. Journal of Food Science, 47, 1186±1197. Claassen, M., & Lawless, H. (1992). Comparison of descriptive terminology systems for sensory evaluation of ¯uid milk. Journal of Food Science, 57(3), 596±600. Clapperton, J. F., & Piggott, J. R. (1979). Flavour characterisation by

495

trained and untrained assessors. Journal of the Institute of Brewing, 85, 275±277. Daget, N., & Collyer, S. (1984). Comparison between quantitative descriptive analysis and physical measurements of gel systems and evaluation of the sensorial method. Journal of Texture Studies, 15, 227±245. Drans®eld, E., Rhodes, D. N., Nute, G. R., Roberts, T. A., Boccard, R., Touraille, C., Buchter, L., Hood, D. E., Joseph, R. L., Schon, I., Casteels, M., Cosentino, E., & Tibergen, B. J. (1982). Eating quality of European beef assessed at ®ve research institute. Meat Science, 6, 163±184. The European Sensory Network (1996). A European sensory and consumer study: a case study on co€ee. Chipping Campden, Gloucestershire: Campden & Chorleywood Food Research Association. Gower, J. C. (1975). Generalized Procrustes analysis. Psychometrika, 40, 33±51. Guerrero, L., Gou, P., & Arnau, J. (1997). Descriptive analysis of toasted almonds: a comparison between expert and semi-trained assessors. Journal of Sensory Studies, 12, 39±54. Heymann, H. (1994). A comparison of descriptive analysis of vanilla by two independently trained panels. Journal of Sensory Studies, 9, 21±32. Hirst, D., & Muir, D. (1994). De®nition of the sensory properties of hard cheese : a collaborative study between Scottish and Norwegian panels. International Dairy Journal, 4, 743±761. Hunter, E. A., Muir, D. D., & Brennan, R. M. (1995). Comparison of the performance of an external sensory panel with an internal panel. In: 4eÁmes JourneÂes Agro-industrie et MeÂthodes statistiques, 7±8 December 1995, Dijon, France. Hunter, E. A., & McEwan, J. (1998). Evaluation of an international ring trial for sensory pro®ling of hard cheese. Food Quality and Preference, 9(5), 343±354. Lavit, C. (1988). Analyse Conjointe de Tableaux Quantitatifs. Paris: Masson. Martin, N., Savonitto, S., Molimard, P., Berger, M., Brousse, M., & Spinnler, H. E. (1999). Flavor generation in cheese curd by coculturing with selected yeast, mold and bacteria. Journal of Dairy Science, 82, 1072±1080. Medeiros, L. C., Field, R. A., Menkaus, D. J., & Russell, W. C. (1987). Evaluation of range-grazed and concentrate-fed beef by a trained sensory panel, a household panel and a laboratory test market group . Journal of Sensory Studies, 2, 259±272. Nielsen, R. G., & Zannoni, M. (1998). Progress in European sensory methodology for evaluation of cheese. International Journal of Dairy Technology, 51, 57±64. Risvik, E., Colwill, J. S., McEwan, J., & Lyon, D. H. (1992). Multivariate analysis of conventional pro®ling data: a comparison of a British and a Norwegian panel. Journal of Sensory Studies, 7, 97± 118. Roberts, A. K., & Vickers, Z. M. (1994). A comparison of trained and untrained judges. Evaluation of sensory attribute intensities and liking of cheddar cheeses. Journal of Sensory Studies, 9, 1±20. Statistical Analysis System Institute Inc. (1989). PC-SAS Version 6.12. Cary, NC: author. Schlich, P. (1996). De®ning and validating assessor comprimises about product distances and attribute correlations. In T. Noes, & E. Risvik, Multivariate analysis of data in sensory science (pp. 259±306). Amsterdam: Elsevier Science. Schlich, P. (1998). What are the sensory di€erences among co€ees Multi-panel analysis of variance and ¯ash analysis? Food Quality and Preference, 9(3), 103±106.