Optimized Descriptive Profile: How many judges are necessary?

Optimized Descriptive Profile: How many judges are necessary?

Food Quality and Preference 36 (2014) 3–11 Contents lists available at ScienceDirect Food Quality and Preference journal homepage: www.elsevier.com/...

443KB Sizes 2 Downloads 86 Views

Food Quality and Preference 36 (2014) 3–11

Contents lists available at ScienceDirect

Food Quality and Preference journal homepage: www.elsevier.com/locate/foodqual

Optimized Descriptive Profile: How many judges are necessary? Rita de Cássia dos Santos Navarro da Silva a,⇑, Valéria Paula Rodrigues Minim a, Alexandre Navarro da Silva b, Luiz Alexandre Peternelli c, Luis Antônio Minim a a

Departamento de Tecnologia de Alimentos, Universidade Federal de Viçosa (UFV), 36570-000 Viçosa, Minas Gerais, Brazil Departamento de Engenharia de Produção e Mecânica, Universidade Federal de Viçosa (UFV), 36570-000 Viçosa, Minas Gerais, Brazil c Departamento de Estatística, Universidade Federal de Viçosa (UFV), 36570-000 Viçosa, Minas Gerais, Brazil b

a r t i c l e

i n f o

Article history: Received 26 July 2013 Received in revised form 20 February 2014 Accepted 20 February 2014 Available online 12 March 2014 Keywords: Re-sampling Computer simulation Discrimination of the samples Sensory map

a b s t r a c t The cost associated with descriptive sensory tests can be derived primarily from two sources: (i) execution time of the test and (ii) the number of judges participating. The Optimized Descriptive Profile (ODP) technique is a new methodology that proposes to reduce test time through an optimized sensory evaluation protocol. The objective of this study was to determine the optimal number of judges for descriptive sensory evaluation using the ODP, so as to show that the technique presents a reduction in time, money and efforts for conducting the methodology, and also in relation to the number of participants making up the panel. The study to determine the optimal number of judges was conducted using the data re-sampling technique for a panel original composed of 26 judges, by means of computer simulation. Data from the complete panel considering 10,000 sub-groups was re-sampled with replacement. The criteria for determining the ideal number of judges were: (i) acquisition of an experimental error less than or equal to the error verified in the reference methodology (Conventional Profile), (ii) obtaining interaction between sample and judges, in terms of size and stability, similar to interaction obtained by complete panel, (iii) concordance rate among products, using paired comparison (sample discrimination), similar to the full panel and (iv) minimal loss of information in the sensory map. The criterion for magnitude of the experimental error estimate showed to be the most robust measure for determination of the number of judges necessary for the ODP technique. Because this technique requires low levels of training of the judges, evaluation of these criteria is extremely important since a larger residual random variation can usually be observed. The criteria for magnitude of the experimental error, interaction between samples and judges effect and concordance rate in paired comparisons were met when sixteen evaluators was used. Ó 2014 Elsevier Ltd. All rights reserved.

1. Introduction The cost associated with descriptive sensory evaluations increases with the number of participating judges. Therefore, determination of the ideal number of evaluators is of utmost importance. According to Heymann, Machado, Torri, and Robinson (2012), it is obvious that training a smaller number of judges requires less time, cost and effort, but this may result in a ‘‘false savings’’ due to the possibility of obtaining ‘‘poor’’ data. Thus, the challenge is to determine the optimal number of judges needed for descriptive assessments that allows for reducing the size of the panel, but without information losses on the sensory profile of the foods. ⇑ Corresponding author. Tel.: +55 31 38993810. E-mail addresses: [email protected] (R.C.S.N. Silva), [email protected] (V.P.R. Minim), [email protected] (A.N. Silva), [email protected] (L.A. Peternelli), lminim@ ufv.br (L.A. Minim). http://dx.doi.org/10.1016/j.foodqual.2014.02.011 0950-3293/Ó 2014 Elsevier Ltd. All rights reserved.

The recommended optimal number of judges composing a panel is not very clear in literature. Different recommendations are encountered depending on the technique used, for example, six judges for the Flavor Profile (Cairncross & Sjostrom, 1950), ten judges for the Texture Profile (Brandt, Skinner, & Coleman, 1963) and ten to twelve judges for the Quantitative Descriptive Analysis (Stone & Sidel, 1985). However, the criteria for determining the number of judges needed are not shown. Calculation of the number of judges in descriptive sensory testing has been little explored in literature. Some studies were conducted to determine the optimal number of judges considering generic methodologies, such as the ‘‘Conventional Profile’’ or ‘‘Descriptive Analysis’’. In most studies, reduction in the number of judges making up the panel was addressed by means of re-sampling data obtained by larger panels (Gacula & Rutenbeck, 2006; Heymann et al., 2012; King et al., 1995; Pagès & Périnel, 2003 and Silva, Minim, Silva, & Minim, 2014).

4

R.C.S.N. Silva et al. / Food Quality and Preference 36 (2014) 3–11

King, Arents, and Moreau (1995) conducted a sensory description of ice cream samples by performing an evaluation with a panel of 20 judges (full panel). Data from the full panel was re-sampled by 20 smaller panels, consisting of 3–13 judges. For the new panels formed the significance of the treatment effects was evaluated for each of the sensory attributes by Analysis of Variance (ANOVA). The full panel presented greater explanation of the variation of the treatment effect, where 76% of the evaluated attributes were significant (p < 0.05) in the ANOVA. When the panel was reduced to half (N = 10), 67% of the attributes presented significance. Further reducing the panel to one quarter (N = 5) made only 34% of the attributes significant. The authors concluded that the reduction in number of judges in the panel resulted in significant loss of information regarding the effect of the treatments. In the study performed by Pagès and Périnel (2003), a sensory description of eight samples of carbonated mineral water was evaluated by a panel of sixteen judges. Data obtained by the full panel was removed from the data set, two at a time, until reaching the minimum number of two assessments by the panel. For the sub-panels formed, the magnitude of the F-ratio and sensory map obtained by the PCA (Principal Components Analysis) were evaluated. No difference between the panels was observed considering these criteria. Gacula and Rutenbeck (2006) determined the number of judges for descriptive sensory tests by computer simulation using experimental data. A panel of six trained judges proceeded to evaluate the samples in obtaining the data. In the simulation two experimental measurements were considered: difference to be detected between the means (d0 ) and the variability of the experiment (Root Mean Square Error – RMSE). A minimal number of five judges was determined to make up the sensory panel. Heymann et al. (2012) conducted a study on the number of judges for descriptive tests by re-sampling the original data. Data from three studies on sensory characterization of red wines was used, which included 14–22 judges. Data obtained by the complete panels was re-sampled by panels with 4, 6, 8, 12 and 14 judges. The new panels of judges formed were evaluated regarding significance (p-value) of attribute descriptors by the ANOVA and the sensory maps obtained by multifactor analysis. The results showed that at least eight judges are needed for the sensory panel. The study by Silva et al. (2014) calculated the optimal number of judges for ‘‘Power analysis and Sample size’’. Three levels of probability were determined for the Type I and Type II errors and the difference to be detected in the experiment (d0 ). The standard deviation values of the experimental error were determined based on data from literature. A total of 574 values of the root mean square error (RMSE) were obtained from previous studies. Data from literature was adjusted for a known probability distribution, using 5% of this distribution in calculation of the number of judges. The required numbers of assessments in the descriptive tests were calculated, considering these different experimental conditions, totaling 135 scenarios. The Optimized Descriptive Profile (ODP) methodology presents no previous studies concerning determination of the number of judges needed in the sensory panel. It was recently proposed as a descriptive method, and therefore there are few studies on this new sensory technique (Silva et al., 2012, 2013). The technique uses an optimized evaluation protocol, presenting a comparative evaluation between the samples, followed by a quantitative evaluation using an interval scale. Because the technique recommends the participation of judges with a low degree of training and the evaluation protocol of the foods is different, it is necessary that a particular study of this method is performed to determine the optimal number of judges. This study sought to determine the optimal number of judges for the sensory descriptive analysis of foods using the Optimized Descriptive Profile (ODP), in order to permit that the technique

shows a reduction of time, money and efforts for conducting the methodology and also in relation to the number of panel members. 2. Materials and methods Determination of the required number of judges for the ODP technique was performed using the re-sampling technique for data obtained by an original panel consisting of 26 judges by means of computer simulation. Data from the full panel was re-sampled considering 10,000 iterations with replacement. The experimental data was obtained by means of sensory characterization using the ODP technique for two food matrices: strawberry-flavored yogurt (Experiment A) and chocolate (Experiment B). The criteria for determining the optimal number of judges were: (i) obtaining an experimental error less than or equal to the error found for the reference methodology (Conventional Profile), (ii) obtaining interaction between samples and judges, in terms of size and stability, similar to the interaction obtained by the full panel, (iii) concordance rate among products, using paired comparison (sample discrimination), similar to the complete panel and (iv) minimal loss of information in the sensory map. 2.1. Stimulus Two types of food matrices were used (yogurt and chocolate) in the sensory characterization. The formulations were defined based on preliminary triangular tests, in which the samples presented a small magnitude of difference (p < 0.10) in the sensory characteristics, showing proportion of distinguishers (Pd) minor than 0.29 in the guessing model, equivalent to d0 equal to 1.6 in Thurstonian model. The probability of error Type II was established at 0.10. 2.1.1. Experiment A Five strawberry flavored yogurt formulations were utilized. A commercial brand yogurt was used for preparation of the formulas. Different concentrations of milk, sugar, powdered milk and pink dye were added at different concentrations (Table 1). 2.1.2. Experiment B Four chocolate formulations were used. Chocolate formulations were prepared with three different chocolate types from the same brand and each unit measured 29 mm in diameter and 20 mm in height. In preparation of the formulations different mixtures of milk chocolate, semisweet chocolate and bitter were used. The amounts of each type of chocolate used in the process are described in Table 2. 2.2. Procedure Sensory evaluation of the test-formulations (yogurt and chocolate) was performed using the evaluation protocol of the Optimized Descriptive Profile technique, (Silva et al., 2012, 2013). Thus, two panels of 26 judges participated in evaluations of the strength attribute descriptors (Tables 2 and 3). One panel of 26 judges performed a sensory evaluation of the five yogurt formulations and another panel of 26 judges evaluated the sensory characteristics of four chocolate formulations. For ODP technique, the judges were recruited by structured questionnaires and pre-selected by difference tests (e.g., triangular tests). They also defined the sensory attributes for descriptive evaluation of the samples and the reference materials for each attribute. The judges evaluated the products in relation to the sensory attributes using a 9 cm unstructured rating scale, with three repetitions, according to the ODP protocol.

5

R.C.S.N. Silva et al. / Food Quality and Preference 36 (2014) 3–11 Table 1 Test formulations of strawberry-flavored yogurt. Quantity of ingredients added to a liter (1 L) of commercial yogurt. Yogurt samples

Milk (mL)

Sucrose (g)

Powdered milk (g)

Pigment (mL)

Yog-1 Yog-2 Yog-3 Yog-4 Yog-5

100.0 50.0 – – –

60.0 30.0 – – –

– – – 90.0 180.0

– – – 0.2 0.4

Yog-3: commercial sample.

Table 2 Compositions of the chocolates in relation to the type and quantity of chocolate used in processing. Chocolate samplesa

Choc-1 Choc-2 Choc-3 Choc-4 a

Type and quantity (g) of chocolate Milk

Semisweet

Bittersweet

9.6 9.6 – 6.0

2.4 – 12.0 –

– 2.4 – 6.0

Each unit contains 12 g of chocolate.

Table 3 Sensory attributes reported for the test-formulations. Experiment A (Yogurt samples)

Experiment B (Chocolate samples)

Pink color (Pc) Sweetness (Sw-yog) Strawberry flavor (Sf) Cream flavor (Cf) Creaminess (Cr) Viscosity (Vis) Flow resistance (Fr) Farinaceous texture (Ft)

Brown color (Bc) Mass cocoa aroma (Mca) Mass cocoa flavor (Mcf) Sweetness (Sw-choc) Residual bitterness (Rb) Hardness (Ha) Spreadability (Sp)

2.3. Data analysis 2.3.1. Simulation The method used to compare different panel sizes consisted of measuring the loss of information when obtaining sub-groups with k judges from the original panel which consisted of 26 judges. To measure the loss of information, 10,000 sub-panels were simulated for each k number of judges for re-sampling with replacement, k = 2, 4, 6, . . ., 22 at increments of 2. The simulations were performed by means of programs developed in the R software (R Development Core Team, 2012), storing information of the evaluated criteria for determining the optimal number of judges. 2.3.2. Criteria The criteria considered in determining the optimal number of judges in the ODP were: estimated experimental error given by the root mean square error corrected for the size of the rating scale (RMSEL), effect of interaction between samples and judges obtained by ANOVA, discrimination of the product samples measured by concordance rate in products paired comparison by Least Significant Difference test (LSD) and multivariate RV coefficient obtained by the Generalized Procrustes Analysis, GPA (Robert & Escoufier, 1976). 2.3.2.1. Estimate of the experimental error. Random variability of the experiment was obtained by estimating the experimental error in the two-way ANOVA. The factors evaluated in the ANOVA were: formulations (fixed effect) and judges (random effect), as well as the effect of interaction between the formulations and judges, as recommended by Stone and Sidel (2004) for descriptive methods.

Estimation of the experimental error permits that random variation (not subject to control) is estimated, providing information on the variability of responses given by the judges in the evaluation repetitions. The variability estimate of the experimental error is obtained by the mean squared error (MSE) in the ANOVA. Standard deviation of the experimental error is commonly used to measure the random effect of the experiment, since unlike variance, this presents a standardized measure. The standard deviation is obtained from the root mean square error (RMSE). To standardize the values obtained in this study so that comparisons can be made with other studies which used different sized evaluation scales, the root mean square error of each measurement was divided by the length of the scale, obtaining RMSEL values (Root Mean Square Error Length) as recommended by Hough et al. (2006). Thus, if an RMSE of 1.8 was obtained using a 9 cm scale, the value of the RMSEL is 1.8/9 = 0.20. In the calculation of variance, if a data set is divided by a number, the result of the root mean square error will also be divided by the same number. Therefore, dividing the RMSE by the size of the scale is equivalent to having previously standardized the length of all rating scales for a range of 0–1. The RMSEL was obtained for 10,000 sub-groups within each k number of judges. The values were stored and it was verified if they met the requirement defined for this criterion. 2.3.2.2. Effect of interaction between samples and judges: size and stability. The interaction between samples and judges is a common effect in descriptive techniques, it has been observed in several studies using panels with high degree of training, including those of Cardello and Faria (1998, 2000), Monrozier and Danzart (2001), Rocha, Minim, Della Lucia, Minim, and Coimbra (2003), Richter, Almeida, Prudencio, and Benassi (2010) and Silva et al. (2012). However, this effect is undesirable and it is difficult to control. Moreover, the ODP evaluation protocol can be presented, in previous researches, as an alternative to reduce this effect due to exposure to all formulations simultaneously in a single evaluation session, which can minimize the ‘‘inversion’’ effect in the perception of sensory stimuli interactions (Silva et al., 2012, 2013). The size of the interaction was computed by the ratio of variances (q). It is a ratio of the variance of sample over the sum of the variance of sample and of interaction, as presented in Eq. (1).



MSsample ; MSsample þ MSinteraction

ð1Þ

where, MSsample is the samples mean square and MSinteraction is interaction mean square. Since q is between 0 and 1, it can be interpreted as the proportion of variance due to sample effect. Thus, the higher the value of the ratio (q) indicates less interaction effect. In special cases where MSinteraction = 0, the F-statistic cannot be computed, while q reaches the maximal value of 1. This criterion has also been used to compare panel performances in descriptive sensory studies (Pineau, 2006). The stability of interaction was computed by standard error of the mean calculated using the MSinteraction as error.

6

R.C.S.N. Silva et al. / Food Quality and Preference 36 (2014) 3–11

The q ratio and the standard error of the mean were obtained by two-way ANOVA for 10,000 sub-groups within each k number of judges. The values were stored and it was verified if they met the requirement defined for this criterion.

(i) Estimate of the experimental error: no more than 10% of the sub-groups with RMSEL values greater than the ‘‘cutting point’’ (CP = 0.1811). This ‘‘cutting point’’ was regarded as the 90th percentile of the RMSEL distribution described in Silva et al. (2014). The RMSEL distribution considered was obtained from a literature review, where 574 RMSEL values were collected from previous studies on food descriptions using the traditional ‘‘Conventional Profile’’ technique. (ii) Interaction between samples and judges: no more than 10% of sub-groups with interaction proportion greater than 10% of value obtained in the full panel, q(Nk) > 0.9 ⁄ q(N) (size) and no more than 10% of sub-groups with standard error of the mean, calculated using the MSinteraction as error, greater than the value obtained in the complete panel (stability). (iii) Concordance rate in product paired comparison: no more than 10% of sub-groups with discordant pairs in the Least Significant Difference test (LSD), in other words, no more than 10% of simulations having more discordant pairs among pðp1Þ 2 pairs of samples than the complete panel. (iv) Similarities of the sensory maps: no more than 10% of the subgroups with RV coefficient less than 0.90. The RV coefficient was calculated by the comparison between sensory maps obtained by each sub-group with k judges and by the full panel.

2.3.2.3. Concordance rate among products, using paired comparison: discrimination of samples. Discrimination of product samples was assessed by the concordance rate in products paired comparison by Least Significant Difference test (LSD) at p < 0.10. Concordance rate is the ratio of pairs over the total  the number of concordant  number of pairs pðp1Þ . For each pair of samples, pairs;p samples 2 concordance is declared when the same conclusion is obtained from both panels (the whole and the reduced one). For significant pairs, concordance requires also the same sample rank order. Hence, if a pair is significant in both, the whole panel and the reduced one, but with a change in product rank order then this change will account for discordance. The Least Significant Difference test (LSD) at p < 0.10 was conducted for 10,000 sub-groups within each k number of judges in the R software, where the number of concordant pairs was stored and subsequently compared with the requirement defined in this study.

2.3.2.4. Similarity of the sensory maps. The sensory map can be obtained by different multivariate statistical analyses, such as the Principal Component Analysis (PCA) and graphically represent the sensory profile of the product characterized by the sensory panel (Meilgaard, Civille, & Carr, 2006). Comparison between two spatial configurations can be performed by the Generalized Procrustes Analysis (GPA), obtaining the multivariate RV coefficient (Robert & Escoufier, 1976). This coefficient measures the correlation between two or more spatial configurations and ranges from 0 (total disagreement between configurations) to 1 (perfect agreement). The spatial configuration of each of the 10,000 sub-groups formed for each k number of judges was compared with the spatial configuration of the complete panel. The RV coefficient was calculated by the R software, stored and compared with the decision criteria described below.

2.3.3. Decision on the criteria For each k number of judges, ten thousand values were obtained per criteria, corresponding to 10,000 simulations. For each criterion, a decision criterion was determined for the optimal number of judges in the ODP panel, where:

3. Results 3.1. ANOVA, Least Significant Difference test (LSD) and sensory map: full panel Results of the two-way ANOVA on the attributes scores led to a significant sample effect at p < 0.0001 for all trials. Therefore, the null hypothesis (equality) was rejected, indicating difference between the samples. Table 4 shows the main parameter values. The level of heterogeneity of sensory attributes and food matrices reported as the root mean square of error (RMSE) was similar for all trials. Amplitude of the RMSEL ranged from 0.9794 to 0.1685. The attribute presenting the smallest variability of the experimental error was the ‘‘mass cocoa flavor’’ of chocolate samples and the greatest variability was observed for the ‘‘spreadability’’. Estimate of the experimental error represents the variability in the responses of the judges in the three repetitions, showing that for the ‘‘mass cocoa flavor’’ attribute the grades of the repetitions were more homogeneous. However, for the ‘‘spreadability’’ attribute the variation between repetitions was higher. Magnitude of

Table 4 Summary of the two-way ANOVA for the full panel data and the ratio of variances (q). Products/attributes

p

MSsample

p

MSinteraction

Yogurt Pink color (Pc) Sweetness (Sw-yog) Strawberry flavor (Sf) Cream flavor (Cf) Creaminess (Cr) Viscosity (Vis) Flow resistance (Fr) Farinaceous texture (Ft)

14.9239 10.4356 18.3876 17.9193 18.1663 19.4666 19.9393 20.4937

1.2240 2.9904 2.1891 3.0920 1.5572 1.3079 1.2291 1.3412

Chocolate Brown color (Bc) Mass cocoa aroma (Mca) Mass cocoa flavor (Mcf) Sweetness (Sw-choc) Residual bitterness (Rb) Hardness (Ha) Spreadability (Sp)

26.1921 27.1387 28.6109 25.5256 25.2098 26.8122 25.8029

1.7902 1.6530 1.5240 1.8913 1.7095 2.0224 1.6512

p

Fsample

q

0.9906 1.2209 1.1340 1.3779 1.2057 1.2311 1.2291 1.1683

148.67 12.18 70.55 33.60 136.10 221.55 263.16 232.95

0.9933 0.9241 0.9860 0.9711 0.9927 0.9955 0.9962 0.9957

1.4038 1.2797 0.9794 1.4207 1.2672 1.2813 1.2133

214.07 269.54 352.45 182.15 217.47 175.76 244.21

0.9971 0.9978 0.9988 0.9969 0.9975 0.9977 0.9978

MSerror

R.C.S.N. Silva et al. / Food Quality and Preference 36 (2014) 3–11

this variation is influenced by the type of food matrix and attribute evaluated, since complexity of the sensory evaluation is related to the stimulus caused by the food, which may or may not be easily detected and quantified (King et al., 1995). Significative effect of interaction between samples and judges indicates that at least one judge evaluated the formulations differently from the panel (Stone & Sidel, 2004). For all attributes, the interaction effect was significant (p < 0.10) and the root interaction mean square varied from 1.2240 to 3.0920. The ‘‘pink color’’ yogurt attribute showed lowest interaction variation and the ‘‘cream flavor’’, highest interaction variation. The interaction effect was computed by the ratio of variances (q). Thus, the higher the value of the ratio (q) indicates less interaction effect. For all attributes, the ratio of variances (q) showed values upper 0.90, indicating elevated proportion of variances, i.e., low interaction effect. The root mean square samples measures the magnitude of differences between products samples and varied strongly, from 10.4356 for ‘‘sweetness’’ of yogurt to 28.6109 for mass ‘‘mass cocoa flavor’’ of chocolate, showing better sample’s discrimination for ‘‘mass cocoa flavor’’ than ‘‘sweetness’’. Significance of the sample effect indicates that at least a contrast between averages is statistically different from zero, i.e., on average at least one treatment differed from the others. Therefore, this comparison can be performed between the two formulations considered most different, which facilitates obtaining a significant effect. Moreover, in the Least Significance Difference test (LSD), Tables 5 and 6, which performed pair comparisons, showed the different number of concordant pairs (q) for attributes. The attributes ‘‘hardness’’ and ‘‘spreadability’’ of chocolate samples showed lowest q, i.e., highest sample’s discrimination. The highest q was verified for the ‘‘strawberry flavor’’, ‘‘cream flavor’’ and ‘‘farinaceous texture’’ attributes. The descriptive maps (PCA) obtained in sensory characterization of the product (chocolate and yogurt) showed that for both models a high percentage of total variability was explained by the first principal component, making up 93.36% of the yogurts and 99.20% of the chocolates. Therefore, only one dimension was considered in the graphical representation for the purpose of interpretation (Fig. 1). Table 5 Average intensity values of yogurt attributes and number of concordant pairs (q) obtained by the Least Significance Difference test (p < 0.10). Mean Yogurt

Yog-1

Yog-2

Yog-3

Yog-4

Yog-5

q

Pc Sw-yog Sf Cf Cr Vis Fr Ft

1.9d 5.6a 5.5a 3.1b 2.1c 1.6d 1.7c 1.4b

2.2cd 4.5b 5.6a 2.8b 2.7b 2.2c 2.1c 1.3b

2.4c 2.7d 5.4a 3.2b 2.4bc 2.5c 2.8b 1.4b

4.8b 3.1c 1.7b 6.7a 6.1a 6.4a 6.3a 5.6a

5.6a 3.6c 1.7b 6.7a 6.1a 5.7b 6.2a 5.6a

2 1 4 4 3 1 2 4

7

In this type of analysis, the two-dimensional representation permits easy visualization of the sensory map. The map represents the sensory profile of the product, identifying the attributes of greatest importance in characterization of each of the formulations. The objective of this study is to ensure that the reduction in the number of judges does not alter the sensory profile of the products obtained by the full panel. 3.2. Criterion I: estimate of the experimental error In the criterion for comparison of the experimental error obtained in the sub-groups with the reference value from literature (cutting point), it was found that the panel of 16 judges met the criteria established considering the two products and the different attributes (Table 7). Thus, for N equal to 16 evaluators, at least 90% of the sub-groups formed presented estimates of the RMSEL less than or equal to the estimative value obtained by the conventional method ‘‘Conventional Profile’’. The error value established as the ‘‘cutting point’’ presents a 90% probability of occurrence in descriptive studies, representing the overall variability of previous sensory description studies. There was a difference in the number of judges needed in relation to the products and attributes evaluated. For the attribute ‘‘brown color’’ (Bc) of chocolate 16 judges were required to meet the criteria established; for the attribute ‘‘pink color’’ (Pc) only two judges were necessary. The variation between repetitions of the assessments may have different amplitudes depending on the product and attribute, and the individual may also present differences in this variability (King et al., 1995; Stone & Sidel, 2004). In the ODP evaluation technique, the judges have a low level of training, which can result in greater internal variability and thus higher random experimental variance. Thus, because the judges may present different levels of variability, the random inclusion of a judge with greater residual variability leads to an increase in variance of the group. 3.3. Criterion II: interaction between samples and judges effect: size and stability In the evaluation of interaction effect criterion, the coefficient q was used to evaluate the size of interaction effect and the standard error of the mean was used to evaluate the stability of effect. For the size of interaction effect, it was found that panels consisting of only 6 judges met the criteria established for almost all attributes (Table 8). Only the attributes ‘‘cream flavor’’ and ‘‘sweetness-yogurt’’ did not showed attending to criterion with 6 judges in the panel. For these attributes, 14 and 16 judges, respectively, were required. For the stability of interaction effect (Table 9), it was found that panels with 14 judges met the criteria for yogurt attributes and 16 judges were necessary for the chocolate attributes.

Means followed by the same letter in the row do not differ at 10% probability. Table 6 Average intensity values of chocolate attributes and number of concordant pairs (q) obtained by the Least Significance Difference test (p < 0.10). Mean Chocolate

Choc-1

Choc-2

Choc-3

Choc-4

q

Bc Mca Mcf Sw-choc Rb Ha Sp

1.6c 1.3c 1.0c 7.6a 0.7c 1.5d 7.7a

3.1b 2.3b 1.8b 6.8b 1.5b 2.5c 6.9b

7.4a 7.2a 7.0a 2.2c 6.1a 7.8a 1.7d

7.3a 6.9a 6.9a 2.4c 5.9a 6.1b 2.9c

1 1 1 1 1 0 0

Means followed by the same letter in the row do not differ at 10% probability.

3.4. Criterion III: discrimination of samples: concordance rate among products, using paired comparisons In the Least Significant Difference test (LSD) criterion, at least 90% of the sub-groups considering a determined panel size (k number of judges), must present the same number of concordant pairs (q) verified in the full panel. The q values for complete panels (chocolate and yogurt samples) were showed in the Tables 5 and 6. It was found that for all attributes, the panels consisting of 16 judges met the criteria established for samples discrimination criteria measured by concordance rate in product pair comparison (Table 10). In this criterion, concordance is declared when the same conclusion is obtained from both panels (the whole and the reduced

8

R.C.S.N. Silva et al. / Food Quality and Preference 36 (2014) 3–11

Fig. 1. Sensory maps of the Principal Component Analysis for strawberry-flavored yogurt (Experiment A) and chocolate samples (Experiment B). Table 7 Percentage of sub-groups that met the criteria of the experimental error estimate. Products/attributes

a

Panel size (k) 2

4

6

8

10

12

14

16

18

20

22

Yogurt Pc Sw-yog Sf Cf Cr Vis Fr Ft

0.941a 0.937a 0.895 0.744 0.952a 0.873 0.991a 0.929a

0.992 0.988 0.965a 0.747 0.962 0.878 1.000 0.985

0.999 0.998 0.990 0.801 0.996 0.855 1.000 0.998

1.000 0.999 0.998 0.833 1.000 0.862 1.000 1.000

1.000 1.000 1.000 0.874 1.000 0.924a 1.000 1.000

1.000 1.000 1.000 0.896 1.000 0.953 1.000 1.000

1.000 1.000 1.000 0.926a 1.000 0.987 1.000 1.000

1.000 1.000 1.000 0.937 1.000 0.999 1.000 1.000

1.000 1.000 1.000 0.962 1.000 1.000 1.000 1.000

1.000 1.000 1.000 0.973 1.000 1.000 1.000 1.000

1.000 1.000 1.000 0.986 1.000 1.000 1.000 1.000

Chocolate Bc Mca Mcf Sw-choc Rb Ha Sp

0.786 0.810 0.993a 0.758 0.850 0.864 0.893

0.774 0.870 1.000 0.741 0.898 0.896 0.915a

0.735 0.929a 1.000 0.746 0.945a 0.953a 0.973

0.701 0.965 1.000 0.778 0.981 0.985 0.997

0.740 0.986 1.000 0.816 0.997 0.998 1.000

0.780 0.999 1.000 0.860 0.999 1.000 1.000

0.845 1.000 1.000 0.895 1.000 1.000 1.000

0.902a 1.000 1.000 0.944a 1.000 1.000 1.000

1.000 1.000 1.000 0.986 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

Smallest panel size with attending to the criteria I.

Table 8 Percentage of sub-groups that met the criteria of the size of interaction effect. Products/attributes

a

Panel size (k) 2

4

6

8

10

12

14

16

18

20

22

Yogurt Pc Sw-yog Sf Cf Cr Vis Fr Ft

0.742 0.233 0.548 0.381 0.762 0.851 0.879 0.776

0.988a 0.260 0.725 0.345 0.928a 0.961a 0.999a 0.960a

1.000 0.349 0.946a 0.457 1.000 0.999 1.000 1.000

1.000 0.459 0.996 0.603 1.000 1.000 1.000 1.000

1.000 0.603 0.999 0.735 1.000 1.000 1.000 1.000

1.000 0.731 1.000 0.846 1.000 1.000 1.000 1.000

1.000 0.845 1.000 0.931a 1.000 1.000 1.000 1.000

1.000 0.927a 1.000 0.975 1.000 1.000 1.000 1.000

1.000 0.977 1.000 0.993 1.000 1.000 1.000 1.000

1.000 0.997 1.000 0.999 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

Chocolate Bc Mca Mcf Sw-choc Rb Ha Sp

0.806 0.836 0.948a 0.788 0.844 0.818 0.802

0.995a 1.000a 1.000 0.953a 0.999a 0.875 0.998a

1.000 1.000 1.000 1.000 1.000 1.000a 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

Smallest panel size with attending to the criteria IIa.

one), concordance requires also the same sample rank order. For two food matrices types (yogurt and chocolate) was verified difference in the number of judge’s recommendation. For the chocolate samples, the number of judges recommendation various form 8 to 16, depending of sensory attribute. On the other hand, for yogurt samples, the recommendation various form 2 to 12 judges.

3.5. Criterion IV similarities of the sensory maps In comparison of the sensory maps obtained for each of the subgroups composed of k judges and by the full panel, it was found that evaluation of only 2 judges in the sensory panel allows for acquiring a spatial configuration very similar to the configuration

9

R.C.S.N. Silva et al. / Food Quality and Preference 36 (2014) 3–11 Table 9 Percentage of sub-groups that met the criteria of the stability of interaction effect. Products/attributes

a

Panel size (k) 2

4

6

8

10

12

14

16

18

20

22

Yogurt Pc Sw-yog Sf Cf Cr Vis Fr Ft

0.728 0.723 0.729 0.725 0.724 0.781 0.721 0.749

0.767 0.763 0.717 0.704 0.791 0.804 0.791 0.768

0.807 0.818 0.784 0.738 0.800 0.770 0.854 0.758

0.845 0.859 0.831 0.803 0.818 0.708 0.899 0.774

0.889 0.902a 0.871 0.851 0.863 0.716 0.943a 0.829

0.927a 0.945 0.906a 0.891 0.931a 0.808 0.972 0.883

0.961 0.973 0.947 0.927a 0.979 0.912a 0.989 0.932a

0.982 0.993 0.977 0.968 0.996 0.984 0.997 0.979

0.999 1.000 0.995 0.995 0.999 0.999 0.999 0.999

1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

Chocolate Bc Mca Mcf Sw-choc Rb Ha Sp

0.107 0.101 0.086 0.153 0.113 0.149 0.161

0.037 0.043 0.016 0.089 0.040 0.134 0.126

0.053 0.019 0.015 0.121 0.055 0.260 0.164

0.102 0.063 0.36 0.229 0.116 0.422 0.242

0.194 0.111 0.102 0.339 0.249 0.547 0.339

0.334 0.272 0.271 0.449 0.476 0.546 0.567

0.597 0.645 0.653 0.686 0.953a 0.598 0.820

0.909a 0.965a 0.975a 0.943a 0.974 0.973a 0.988a

1.000 1.000 1.000 1.000 1.000 1.000 1.00

1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

Smallest panel size with attending to the criteria IIb.

Table 10 Percentage of sub-groups that met the criteria of concordance rate in product pair comparison. Products/attributes

a

Panel size (k) 2

4

6

8

10

12

14

16

18

20

22

Yogurt Pc Sw-yog Sf Cf Cr Vis Fr Ft

0.277 0.256 0.856 0.706 0.891 0.163 0.286 0.987a

0.482 0.378a 0.991a 0.878 0.985a 0.307 0.497 1.000

0.627 0.546 0.999 0.966a 1.000 0.598 0.699 1.000

0.782 0.703 1.000 0.995 1.000 0.749 0.851 1.000

0.895 0.827 1.000 0.999 1.000 0.859 0.946a 1.000

0.967a 0.910 1.000 1.000 1.000 0.936a 0.989 1.000

0.998 0.965 1.000 1.000 1.000 0.982 0.999 1.000

0.999 0.988 1.000 1.000 1.000 0.998 1.000 1.000

1.000 0.998 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000

Chocolate Bc Mca Mcf Sw-choc Rb Ha Sp

0.560 0.279 0.345 0.288 0.207 0.107 0.132

0.674 0.408 0.479 0.357 0.222 0.231 0.199

0.798 0.591 0.619 0.439 0.323 0.381 0.314

0.911a 0.697 0.751 0.549 0.482 0.617 0.421

0.975 0.809 0.866 0.654 0.665 0.801 0.599

0.998 0.938a 0.947a 0.770 0.810 0.907a 0.706

0.999 0.992 0.987 0.884 0.922a 0.967 0.829

1.000 1.000 0.998 0.913a 0.980 0.992 0.935a

1.000 1.000 1.000 0.971 0.998 0.999 0.999

1.000 1.000 1.000 1.000 1.000 1.000 1.000

1.000 1.000 1.000 1.000 1.000 1.000 1.000

Smallest panel size with attending to the criteria III.

Table 11 Percentage of sub-groups that met the similarity criteria of the sensory maps. Products

Yogurt Chocolate a

Panel size (k) 2

4

6

8

10

12

14

16

18

20

22

0.925a 0.914a

0.959 0.937

0.992 0.982

0.999 1.000

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

1.000 1.000

Smallest panel size with attending to the criteria IV.

obtained by the full panel, presenting a RV coefficient greater than 0.90 (Table 11). In the GPA, the individual configurations (panel with k judges and full panel) are subjected to three types of transformations (scaling, rotation and translation) in obtaining the consensus configuration. After performing the necessary transformations, the distance is measured between the individual configurations and the consensus for calculating the RV coefficient (Dijksterhuis, 1996). Therefore, only large deviations between these configurations are usually detected by this coefficient. This highlights the

importance of evaluating more than one criterion for determining the optimal number of judges, since a single measure may result in the loss of important information on data variation.

4. Discussion The recommendation of the number of judges in the ODP varied according to the requirements considered, where for acquiring an estimate of the experimental error similar to the conventional

10

R.C.S.N. Silva et al. / Food Quality and Preference 36 (2014) 3–11

method, 16 judges were needed. In contrast, to obtain a similar sensory map to the full panel, only 2 judges were sufficient to make up the panel. Multivariate techniques used to obtain sensory maps measure the similarity between assessments made by two teams. In this process, configurations of the samples are evaluated to verify similarity between the sensory maps, where not only the magnitudes of the assigned scores are evaluated (Dijksterhuis, 1996). Similar maps indicate that the judges evaluated the samples in a consensual manner, where similar sensory profiles are assigned to each of the products by the different judges. Therefore, sub-groups with only two judges presented the same configuration as the full panel, indicating that the judges showed consensus in the evaluations. Another limitation of GPA technique in this study refers to the low number of samples in evaluation. On the other hand, in criterion I the decision criteria was not established in relation to the complete group, but instead with respect to a global cut-off which was calculated involving different food matrices and sensory attributes. Thus, the same ‘‘cutting point’’ was established for all attributes, regardless of the food, which may have caused greater difficulty in meeting the criterion. According to King et al. (1995), the estimated random variance is influenced by the food matrix under evaluation. Therefore, establishment of a global limit (cutting point) for this variation is interesting and was more accurate for determining the number of judges in the panel of the ODP. In the size interaction criterion, was verified that for the most attributes, panels with only 6 judges showed attending to the criterion. For the ‘‘cream flavor’’ and ‘‘sweetness-yogurt’’ attributes was required 14 and 16 judges in the panel, which showed highest interaction variation in the full panel to these attributes (Table 4). Required number of judges for stability of interaction criteria varied of 10–16 judges. The stability of interaction effect was more accurate than size of effect for the number of judges determination. In the Least Significant Difference test (LSD) criterion, the recommendation of number of judges varied from 2 to 16. For the attributes ‘‘hardness’’ and ‘‘spreadability’’ of chocolate samples that showed highest sample’s discrimination (lowest q), was necessary higher number of judges (N = 16) for attending to criterion. On the other hand, for the lowest sample’s discrimination (highest q), was recommended lower number of judges: ‘‘farinaceous texture’’ (N = 2), ‘‘strawberry flavor’’ (N = 4) and ‘‘cream flavor’’ (n = 6). In this study it was observed that the different evaluation criteria, types of food matrices and attribute descriptors evaluated showed a great variation in determination of the optimal number of judges in the ODP. This behavior explains the discrepancy in the recommended number of judges from literature for conventional descriptive methods, ranging from two (Pagès & Périnel, 2003) to twenty panelists (King et al., 1995). The difference in the recommended number of judges verified when considering the different criteria evaluated in this study underscores the importance of evaluating more than one criterion for determining the optimal number of judges. Observation of a single measurement, for example the RV coefficient, may result in the loss of information for discrimination of the samples (criterion III) or for interaction effect (criterion II), and more importantly the high residual variation in the experiment (criterion I). In studies conducted with the ODP technique (Silva et al., 2012, 2013), fourteen and fifteen judges were used, respectively. In these studies a great similarity was verified between the spatial configurations obtained by ODP and the Conventional Profile in relation to principal components of the PCA. This behavior is consistent with the data obtained in this study, since with fourteen judges 100% of the sub-groups simulated showed RV coefficients greater than 0.90.

In previous studies of the ODP, discriminating power of the products very similar to the Conventional Profile was verified when observing the F-ratios, the significance of the effect of formulations and also the power of the test (1  b). The ODP showed to be a sensory descriptive technique very similar to the Conventional Profile (CP). Therefore, it is logical that determination of the number of judges in this method resembles the recommendations in literature for the CP, especially in relation to similarity criteria of sensory maps and discrimination of products in the F-test. In the ODP, the magnitude of the experimental error criterion showed to be a more accurate measure for determining the optimal number of judges. Because the ODP requires a low level of training the judges, the evaluation is of extreme importance, where since the judges are not extensively trained a higher residual random variation can be observed. Thus, for a sensory characterization experiment using the ODP technique to meet the required magnitude of the experimental errors stipulated in this study, it is recommended to use panels with at least sixteen judges. Given this minimum number of evaluations, the random variation of the experiment is within the range expected for descriptive studies on foods, with 90% probability. 5. Conclusions The criterion for magnitude of the experimental error estimate showed to be the most robust measurement for determining the number of judges required for the ODP technique. Because this descriptive technique requires a low level of training of the judges, evaluation of this criterion is extremely important because a larger residual random variation can usually be observed. The use of a global limit (cutting point) for this criterion was also interesting since it was found that the two food matrices evaluated presented different recommendations for the number of judges. To meet the criteria for magnitude of the experimental error, interaction between samples and judges effect and concordance rate in pairs comparison, sixteen judges should make up the sensory panel. Acknowledgements The authors would like to acknowledge the CNPq and Fapemig for their financial support. References Brandt, M. A., Skinner, E. Z., & Coleman, J. A. (1963). Texture profile method. Journal of Food Science, 28, 404–409. Cairncross, S. E., & Sjostrom, L. B. (1950). Flavour profiles: A new approach to flavor problems. Food Technology, 4, 308–311. Cardello, H., & Faria, J. B. (1998). Análise descritiva quantitativa da aguardente de cana durante o envelhecimento em tonel de carvalho (Quercus alba L.). Ciência Tecnologia de Alimentos, 18(2), 169–175. Cardello, H., & Faria, J. B. (2000). Sensory profile and physicochemical characteristics of commercial Brazilian sugar cane spirits, both aged and non-aged. Brazilian Journal of Food Technology, 3, 31–40. R Development Core Team (2012). R: A language and environment for statistical computing. 3-900051-07-0. Vienna, Austria: R Foundation for Statistical Computing. . Dijksterhuis, G. (1996). Procrustes analysis in sensory research. In T. Nces & E. Risvik (Eds.), Multivariate analysis of data in sensory science. Elsevier Science B.V.. Gacula, M., & Rutenbeck, S. (2006). Sample size in consumer test and descriptive analysis. Journal of Sensory Studies, 21, 129–145. Heymann, H., Machado, B., Torri, L., & Robinson, A. L. (2012). How many judges should one use for sensory descriptive analysis? Journal of Sensory Studies, 27, 111–122. Hough, G., Wakeling, I., Mucci, A., Chambers, E., Gallardo, I. M., & Alves, L. R. (2006). Number of consumers necessary for sensory acceptability tests. Food Quality and Preference, 17, 522–526. King, B. M., Arents, P., & Moreau, N. (1995). Cost/efficiency evaluation of descriptive analysis panels – I. Panel size. Journal of Sensory Studies, 6, 245–261.

R.C.S.N. Silva et al. / Food Quality and Preference 36 (2014) 3–11 Meilgaard, M. C., Civille, G. V., & Carr, B. T. (2006). Sensory evaluation techniques (4th ed.). Boca Raton: CRC Press. Monrozier, R., & Danzart, M. (2001). A quality measurement for sensory profile analysis: The contribution of extended cross-validation and resampling techniques. Food Quality and Preference, 12(5–7), 393–406. Pagès, J., & Périnel, E. (2003). Panel performance and number of evaluations in a descriptive sensory study. Journal of Sensory Studies, 19, 273–291. Pineau, N. (2006). Performance on sensory studies. A data base approach (Ph.D. thesis). Richter, V. B., Almeida, T. C. A., Prudencio, S. H., & Benassi, M. T. (2010). Proposing a ranking descriptive sensory method. Food Quality and Preference, 21(6), 611–620. Robert, P., & Escoufier, Y. (1976). A unifying tool for linear multivariate statistical methods: The ‘RV’ coefficient. Applied Statistics, 25(3), 257–265. Rocha, F. L., Minim, V. P. R., Della Lucia, F. D., Minim, L. A., & Coimbra, J. S. R. (2003). Avaliação da influência dos milhos QPM nas características sensoriais de bolo. Ciência e Tecnologia de Alimentos, 23(2), 129–134.

11

Silva, R. C. S. N., Minim, V. P. R., Carneiro, J. D. S., Nascimento, M., Della Lucia, S. M., & Minim, L. A. (2013). Quantitative sensory description using the Optimized Descriptive Profile: Comparison with conventional and alternative methods for evaluation of chocolate. Food Quality and Preference, 30, 169–179. Silva, R. C. S. N., Minim, V. P. R., Silva, A. N., & Minim, L. A. (2014). Number of judges necessary for descriptive sensory tests. Food Quality and Preference, 31, 22–27. Silva, R. C. S. N., Minim, V. P. R., Simiqueli, A. A., Moraes, L. E. S., Gomide, A. I., & Minim, L. A. (2012). Optimized Descriptive Profile: A rapid methodology for sensory description. Food Quality and Preference, 24, 190–200. Stone, H., & Sidel, J. L. (1985). Sensory evaluation practices (1st ed.). New York: Academic. Stone, H., & Sidel, J. L. (2004). Sensory evaluation practices (3rd ed.). New York: Academic.