Accepted Manuscript Number of judges necessary for descriptive sensory tests Rita de Cássia dos Santos Navarro da Silva, Valéria Paula Rodrigues Minim, Alexandre Navarro da Silva, Luis Antônio Minim PII: DOI: Reference:
S0950-3293(13)00122-5 http://dx.doi.org/10.1016/j.foodqual.2013.07.010 FQAP 2666
To appear in:
Food Quality and Preference
Received Date: Revised Date: Accepted Date:
27 May 2013 18 July 2013 20 July 2013
Please cite this article as: Silva, R.d.C., Minim, r.P.R., Silva, A.N.d., Minim, L.A., Number of judges necessary for descriptive sensory tests, Food Quality and Preference (2013), doi: http://dx.doi.org/10.1016/j.foodqual. 2013.07.010
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Number of judges necessary for descriptive sensory tests
2 3
Rita de Cássia dos Santos Navarro da Silva1, Valéria Paula Rodrigues Minim1,
4
Alexandre Navarro da Silva2, Luis Antônio Minim1.
5 6
1
7
Zip code: 36570-000, Viçosa, Minas Gerais, Brasil.
8
2
9
Viçosa (UFV), Zip code: 36570-000, Viçosa, Minas Gerais, Brasil.
Departamento de Tecnologia de Alimentos, Universidade Federal de Viçosa (UFV),
Departamento de Engenharia de Produção e Mecânica, Universidade Federal de
10 11
E-mail addresses:
[email protected] (SILVA, R.C.S.N.),
[email protected] (MINIM,
12
V.P.R),
[email protected] (SILVA, A.N.) ,
[email protected] (MINIM, L.A.).
13 14
ABSTRACT
15
To determine the number of judges needed in descriptive tests four parameters are
16
necessary: the probability of type I error (α), the probability of type II error (β), the
17
difference in averages which is sought in the experiment (d’) and standard deviation of
18
the experimental error(s). Probabilities of the experimental errors and the difference that
19
one desires to detect between means should be stipulated by the researcher. Therefore,
20
only the estimate of the experimental error cannot be previously obtained, which must
21
be obtained experimentally or by means of similar researches previously performed.
22
Because in descriptive sensory analysis the most common approach to data analysis is
23
the analysis of variance, the estimated standard deviation of the experimental error is
24
obtained by the root mean square error (RMSE). Therefore, 574 RMSE values were
25
obtained from previous published studies. The data collected was adjusted to Weibull
26
probability distribution (1.8081, 0.11419), where five percentiles of the distribution are
27
considered in the calculations. Determination of the number of evaluations necessary
28
was performed using the procedure “sample size and power analysis” of the JMP/SAS
29
software. Three probability levels were defined for type I and II errors, four levels of
30
mean difference to be detected in the experiment, and five percentiles of RMSE
31
distribution probabilities. The required numbers of evaluations in descriptive tests,
32
considering these different experimental conditions, were calculated totaling 180
33
scenarios. Considering the median of the experimental error, a value of alpha (Type I 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
1
error) of 1%, a value of beta (Type II error) of 5% and a difference between the average
2
of 10% of the sensory scale, 33 evaluations are needed in the descriptive tests. Further
3
considering that each judge evaluates the samples triplicate, 11 judges are necessary for
4
this specific set of parameters. Other scenarios were also discussed in the paper.
5
Key-words: sample size; power analysis; panel size; RMSE.
6 7 8 9
1. Introduction The descriptive sensory analysis of foods consists of the assessment of sensory characteristics by a team of judges, who identify and quantify the intensity of sensory
10
stimuli present in food using the five human senses (sight, smell, hearing, touch and
11
taste) (Murray, Delahunty and Baxter, 2001). Descriptive assessment is a valuable tool
12
in the various stages of food processing: development of new products, quality control,
13
storage and shelf-life (Stone and Sidel, 2004; Meilgaard et al., 2006).
14
Traditional descriptive evaluation techniques require the teaming of judges with
15
a high degree of training, and the high number of judges may be an obstacle to the
16
application of this valuable tool in industry (Heymann et al., 2012). A recommended
17
ideal number of judges to make up a team is not clear from literature. Different
18
recommendations are found depending on the technique used, for example, six judges
19
for the Flavor Profile (Cairncross & Sjostrom, 1950); ten judges for the Texture Profile
20
(Brandt, Skinner, & Coleman, 1963); and ten to twelve judges for the Quantitative
21
Descriptive Analysis (Stone & Sidel, 1985). However, criteria for determining the
22
number of judges needed are not shown.
23
On the other hand, teams composed of different sizes are verified in the generic
24
techniques called “Conventional Profile” or “Descriptive Analysis”. In published
25
studies which utilize generic techniques, there is the use of six judges (Lee and
26
Chambers, 2010; Warmund et al., 2011), seven judges (Guàrdia et al., 2010;
27
Chueamchaitrakum et al., 2011), eight judges (Perrin & Pagès, 2009; Anyango et al.,
28
2011), nine judges (Westad et al., 2003; Campo et al., 2010; Tesfaye et al., 2010; Silva
29
et al., 2012), ten judges (Leighton et al., 2010; Speziale et al., 2010; Silva et al., 2013),
30
eleven judges (Bitnes et al., 2009; Plaehn, 2009; Moussaoui and Varela, 2010; Parr et
31
al., 2010), twelve judges (Sinesio et al., 2009; Bitnes et al., 2009; Biasoto et al., 2010),
32
fifteen judges (Garcia-Carpintero et al., 2011), seventeen judges (Guinard et al., 1999)
33
and twenty judges (Delgado and Guinard, 2011). In general, the lower limit is six judges
34
and the upper limit is twenty judges. 2
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Since the cost associated with sensory evaluation increases with the number of
2
judges participating, it is important to determine the optimal number of evaluators
3
necessary for sensory tests. According to Heymann et al. (2012), it is obvious that
4
training a smaller number of judges requires less time, money and effort, but this may
5
result in a “false savings” due to the possibility of obtaining “poor” data. Thus, the
6
challenge is to determine the optimal number of judges needed in descriptive
7
assessments that allows for reducing the size of the team, but without losses of
8
information on the sensory profile of foods, the description of products and still permit
9
for performing powerful statistical testing.
10 11 12
2. How to determine the ideal number of judges? Calculating the number of judges for sensory testing has been little explored in
13
literature. Some recent studies have been conducted for affective tests, in which the
14
ideal number of consumers was determined for sensory acceptance tests. In these
15
studies, the authors calculated the size of the sample using data obtained experimentally
16
(Gacula and Rutenbeck, 2006; Mammasse and Schlich, 2013), with data obtained from
17
a literature review (Hough et al., 2006) and by survival analysis (Hough et al., 2007;
18
Libertino, Osornio, Hough, 2011). For descriptive tests, re-sampling techniques with
19
experimental data were used by King et al. (1995), Pages and Perinel (2003), Gacula
20
and Rutenbeck (2006) and Heymann et al. (2013).
21
In calculating the number of judges, four parameters must be known, using the
22
concept “sample size and power of analysis”, described in Kraemer and Thiemann
23
(1987), Montgomery (2001) and Walpole et al. (2011), which are: (i) level α –
24
probability of type I error, (ii) level β – probability of type II error, also expressed as
25
power of the test (1 – β), (iii) d’ – difference in average which is sought in the
26
experiment, and (iv) s - standard deviation of the experimental error. The probabilities
27
of experimental errors (α and β) and the difference that one desires to detect between
28
means shall be stipulated by the researcher. Therefore, only the experimental error
29
cannot be previously obtained, which must be obtained experimentally or by means of
30
similar operations previously performed.
31 32 33 34
3. Identifying the parameters Knowledge of the definition and interpretation of the four parameters which must be known to calculate the number of judges is of utmost importance in 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
1
determining the required number of evaluations in descriptive tests. In order to establish
2
the criteria and method for obtaining these parameters, a brief explanation of these
3
concepts will be presented.
4 5 6
3.1. Probability of Type I and Type II error The decision errors denominated Type I and Type II are associated with the
7
hypotheses of the statistical test used to verify the existence of significant differences
8
between treatment means. The null hypothesis (H0) represents equality between the
9
means, which is tested against the alternative hypothesis (Ha) that opposes the decision
10
that at least one mean differs from the others (Montgmonery, 2001). When samples are
11
used and decisions are made based on frequency distributions, there is the probability of
12
committing errors in the decisions. Thus, when detected in a statistical test that the
13
treatment means are different, there is a probability that they are equal considering the
14
population, which characterizes the Type I decision error, with a probability of
15
occurrence α. Similarly, when detecting that the treatment means are equal, this
16
conclusion is also subject to a decision error, which is termed as Type II error with
17
probability of occurrence β. Table 1 illustrates the decision errors associated with the
18
statistical tests.
19
The probabilities of committing Type I and Type II errors should be established
20
prior to conducting the experiment (Stone & Sidel, 2004). These probabilities are also
21
used in the calculations of test power and sample size (Montgomery, 2001; Walpole et
22
al., 2011), according to the research objective.
23 24
3.2. Difference in average (d’)
25
The value of d’ determines the magnitude of difference that must exist between
26
the treatment means for detecting significance between the same, for given values of α
27
and β (Hough et al., 2006). This magnitude is given in percent of the scale size. Thus, if
28
using a scale of 9 cm and a d’ value of 0.02 is stipulated, the treatment means must
29
present a minimum difference 0.18 cm (2% of 9 cm) to be considered different from
30
each other (Figure 1a). For instance, in this case d' value is 2% on a 9 cm scale (i.e. the
31
minimum difference is of 0.18cm), does this mean that if the product A is in average at
32
6.0 and the product B is in average at 6.3, A and B are significantly different since the
33
difference between them is 0.3cm (> 0.18cm defined).
4
To avoid misinterpretation regarding the parameter d’ used in this work and the
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
2
value of delta from the Thurstonian model, this theory will be quickly addressed by
3
making a comparison. In difference sensory tests the Thurstonian model is used to
4
estimate the distance (δ) between the intensity means (μX and μY) of the stimulus of two
5
products, which is given by the number of standard deviations by which the two
6
distributions are separated (Thurstone, 1927), Figure 1b. The value of d’ in this case is
7
the estimate of the parameter δ. In the Thurstonian theory, it is assumed that the
8
perceived intensity of the sensory stimuli follows a normal distribution and that both
9
products (X and Y) exhibit the same standard deviation (Ennis, 1993; Ishii et al., 2007;
10
Jesionka et al., 2013). Thus, it is clear that the value of d’ of the Thurstonian model is a measure of the
11 12
number of standard deviations, which strictly depends on the distribution and variance
13
of the perception of sensory stimuli. At the same time, the value of d’ assumed in this
14
study refers to a percentage value of the scale, i.e., sensory score. In this case no
15
assumption of variance is required.
16
A value of d’ equal to 1 in the Thurstonian theory (comparison tests) represents
17
an equal distance between the means equal to one standard deviation and therefore the
18
two stimuli may be confused (Meilgaard, Civille & Carr, 2006). On the other hand, in
19
descriptive tests the value of d’ equal to 1 indicates that the means should present 100%
20
sensory difference in relation to the scale size, so that significant difference is detected.
21
Thus, a treatment is anchored at the extremity “weak” on the scale and the other
22
treatment anchored at the extremity “strong”, therefore facilitating discrimination.
23
The concept of the value d’ of this study differs from the concept presented by the
24
Thurstonian theory and follows the same description as proposed by Hough et al.
25
(2006).
26 27
3.3. Estimative of standard deviation of the experimental error (Root Mean Square
28
Error - RMSE)
29
In the descriptive sensory analysis the most common approach to data analysis is
30
the analysis of variance (ANOVA), a with an estimated standard deviation of the
31
experimental error obtained by the root mean square error (RMSE). The ANOVA
32
model adopted in this study was the model with only one source of variation (Atributte
33
= media + Product + error).
5
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Calculation of the RMSE depends on the experimental data collection, and
2
therefore it is impossible to obtain prior to the tests. Because determining the required
3
number of evaluations must be conducted prior to conducting the tests, an estimate of
4
this value is required. This estimate can be obtained from literature review studies
5
considering similar work (same type of product, same methodology) or by means of
6
previous experience of the researcher in relation to previous tests performed with the
7
same team or on similar products. Alternatively, preliminary individual evaluation tests
8
on performance of the judges may also be useful in obtaining an expected value for the
9
RMSE.
10
Determination of a RMSE value for subsequent calculation of the number of
11
evaluations is subject to a decision error, thus a safety range is recommended which
12
may even account for dropouts. In any case, it is recommended to always use a greater
13
RMSE value than expected, which permits that the researcher work with a range of
14
security in determining the required number of evaluators.
15
To obtain an estimate of the standard deviation of experimental errors, RMSE
16
values were collected from previous sensorial characterization studies performed using
17
generic methodologies (Conventional Profile or Descriptive Analysis). Thus, we
18
collected 574 RMSE values from 34 previous studies published in the journals Food
19
Quality and Preference and Journal of Sensory Studies in years between 1993 and
20
2012. The Table 2 presents data of the root mean square error obtained from different
21
food descriptive analysis studies, considering different countries, foods, unstructured
22
scale sizes and numbers of judges in the team. The demographics of the judges and the
23
types of food products were sufficiently varied to cover different situations involving
24
descriptive tests.
25
Because data was collected from studies using generic methods, different sized
26
scales may have been used. To standardize values, the root mean square error of each
27
measurement was divided by the length of the scale, obtaining the RMSEL (Root Mean
28
Square Error Length), as recommended by Hough et al. (2006). Thus, if an RMSE of
29
1.2 was obtained using a scale of 15 cm, the value of the RMSEL is 1.2/15 = 0.08. In
30
the calculation of the analysis of variance, if a data set is divided by a number, the result
31
of the root mean square error will also be divided by the same number. Therefore,
32
dividing the RMSE by the size of the scale is equivalent to having previously
33
standardized the length of all rating scales for a range of 0-1.
6
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Hough et al. (2006) calculated the number of consumers necessary for
2
acceptance tests using data collected in literature. In the study, the authors assumed that
3
the values of RMSEL presented symmetric distribution and they therefore performed
4
calculations of sample size considering the mean and standard deviation of RMSEL
5
values obtained from literature. The observation of the RMSEL occurrence frequency in
6
previous studies allows us to infer a probabilistic model that represents these data and
7
thereby gain greater insight into this parameter. Adjustment of a known probability
8
distribution may allow for more realistic calculation of the number of panelists, taking
9
into account the probability of experimentally obtaining a value of RMSEL for a
10
determined magnitude.
11
The present study proposed the calculation of the number of judges for
12
descriptive tests by means of the procedure “sample size and power of analysis”, using
13
the probability distribution of the experimental error estimates, obtained in previous
14
studies in published literature.
15 16
4. Calculation of number of judges in descriptive tests
17
With known RMSEL values obtained from literature (Table 2), the existence of
18
nine outliers was observed using the Box-plot technique, considering the upper limit as
19
three times the interquartile distance above the third quartile. There were no lower
20
outliers, since the distribution of values presented a positive skewness of 0449 and only
21
positive values. Adjusting of a probability distribution to the data was performed using
22
the Kolmogorov-Smirnov adherence test (p > 0.10), adjusted to the Weibull distribution
23
(1.8081, 0.11419), represented in Figure 2.
24
In calculating the number of judges, five percentiles of the probability
25
distribution were considered (10%, 25%, 50%, 75% and 90%). Thus, the RMSEL
26
values used were: 0.0329, 0.0573, 0.0932, 0.1368 and 0.1811.
27
Determination of the optimal number of judges in descriptive tests was
28
performed using the “sample size and power analysis” procedure of the software
29
JMP/SAS. The number of evaluations necessary was also determined considering the
30
five RMSEL values described above, three probability levels of type I error (α = 0.10,
31
0.05, 0.01), three probability levels of type II error (β = 0.20, 0.10, 0.05), and four
32
difference levels to be detected between the experimental means (d’= 0.20, 0.10, 0.05,
33
0.02).
7
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
The Tables 3 to 6 show the number of evaluations (NA) needed in the descriptive
2
tests, considering the four different levels of d’. Thus, for d’=0.20 (Table 3), d’=0.10
3
(Table 4), d’=0.05 (Table 5) and d’=0.02 (Table 6). Each of the components of the
4
equation offers a contribution to determine the number of evaluations, but the variation
5
of some components has drastically different effects. A greater effect was observed in
6
the number of evaluations for the components of difference sought in the experiment
7
(d’) and the standard deviation of the experimental error (RMSEL). The number of
8
evaluations necessary in the descriptive tests increases with the RMSEL value and
9
decreases with values of alpha, beta and d’.
10
The recommendation of the number of evaluations in the descriptive tests varied
11
significantly depending on the parameters considered, where 2 to 2932 measurements
12
were observed. In both cases it would be ideal to work with panels that are not
13
extremely small or extremely large. The recommendation of an extremely high number
14
of evaluations leads us to believe that the experimental conditions of these scenarios are
15
not well defined, which are associated with high probabilities of decision errors and
16
high random variability (RMSE), therefore not recommended. Regarding very small
17
panels, previous studies for determining the number of judges in descriptive tests,
18
carried out by computer simulation with experimental data, also found that few judges
19
may be sufficient for descriptive assessments, where it was verified the need for only
20
two judges (Pages & Perinel, 2003), five judges (Gacula & Rutenbeck, 2006) and eight
21
judges (Heymann et al., 2013).
22
It is important to note that in the descriptive tests several RMSE measurements
23
are obtained and not one single measurement. The largest expected value should thus be
24
considered for determining the optimal number of assessments. Therefore, the number
25
of evaluations will be determined based on the most critical attribute, i.e., that with the
26
highest variance random. The number of judges participating in the team should be
27
large enough to allow that the most critical attribute is correctly evaluated.
28
Because in tables (Tables 3 to 6) shows the number of evaluations needed (NA),
29
the number of replications (r) of the assessments made by each judge must also be
30
considered. Thus, if the judge evaluates each sample only once (r = 1), the number of
31
judges (n) is equal to the number of evaluations (NA). However, if each judge evaluates
32
each sample more than once (r > 1), as commonly used in descriptive sensory tests, the
33
number of judges is given by the ratio between the number of evaluations needed and
34
the number of replications (N = NA/r). 8
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
For example, supposing the expected criteria stipulated by the researcher in his
2
experiment as α = 0.01, β = 0.05, d’= 0.10 and RMSEL = 0.0932 (median), the number
3
of evaluations required is equal to 33. Considering that each judge performs three
4
replications (r) of the sensory evaluation, the ideal number of judges is 33/3 = 11
5
judges. It is also important to note that the repetitions should be performed with a
6
sufficient time interval so that the judge does not remember the previous test.
7
The range of values reported in Tables 3 to 6 is limited to what the authors
8
consider necessary for most practical applications. For example, for a very trained team
9
it may be necessary to consider a value of d’ less than 0.02. In this case, the researcher
10
may use a statistical package to calculate the number of evaluations (NA) needed for the
11
descriptive test. On the other hand, values of RMSEL can also be altered according to
12
the probability of obtaining an experimental variance considered by the investigator. At
13
this point, fitting of a probability distribution of values from literature proves very
14
useful because the researcher can obtain other RMSEL values from the Weibull
15
distribution (1.8081, 0.11419). To determine the number of evaluations needed in other
16
experimental conditions, the procedure “sample size and power analysis” of the
17
JMP/SAS software may be used.
18 19 20
5. Conclusion The collection of experimental error estimates from studies on the sensory
21
description of food performed in literature allows for calculating the number of judges
22
needed for descriptive evaluations. Determining the ideal number of judges in sensory
23
testing is extremely important because the smaller the team size, the lower the costs for
24
performing the sensory tests. On the other hand, this calculation should be done
25
carefully so that information on the sensory profile of the product is not obscured and so
26
that robust and powerful statistical tests can be applied.
27 28 29 30
Acknowledgements The authors would like to acknowledge the CNPq and Fapemig for their financial support.
31 32
References
9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
1
Anyango, J.O., De Kock, H.L., & Taylor, J.R.N. (2011). Evaluation of the fuctional
2
quality of cowpea-fortified traditional African sorghum foods using istrumental and
3
descriptive sensory analysis. LWT – Food Science and Tecnhonology, 44, 2126-2133.
4 5
Biasoto, A.C.T., Catharino, R.R., Sanvido, G.B., Eberlin, M.N., Aparecida, M., & Da
6
Silva, A.P. (2010). Flavour characterization of red wines by descriptive analysis and
7
ESI mass spectrometry. Food Quality and Preference, 21, 755-762.
8 9 10
Brandt, M. A., Skinner, E. Z., & Coleman, J. A. (1963). Texture profile method. Journal of Food Science, 28, 404–409.
11 12
Cairncross, S. E., & Sjostrom, L. B. (1950). Flavour profiles: a new approach to flavor
13
problems. Food Technology, 4, 308–311.
14 15
Campo, E., Ballester, J., Langlois, J., Dacremont, C., & Valentin, D. (2010).
16
Comparison of conventional descriptive analysis and a citation frequency-based
17
descriptive method for odor profiling: An application to Burgundy Pinot noir wines.
18
Food Quality and Preference, 21, 755-762.
19 20
Chueamchaitrakun, P., Chompreeda, P., Haruthaithanasan, V., Suwonsichon, T.,
21
Kaswmsamran, S., & Prinyawiwatkui, W. (2011). Sensory descriptive and texture
22
profile analyses of butter cakes made from composite rice flours. International of
23
Journal of Food Science and Techology, 45, 2358-2365.
24 25
Delgado, C., & Guinard, J.X. (2011). How do consumer hedonic ratings for extra olive
26
oil relate to quality ratings by experts and descriptive analysis ratings? Food Quality
27
and Preference, 22, 213-322.
28 29
Ennis, D. (1993). The power of sensory discrimination methods. Journal of Sensory
30
Studies, 8(4), 353–370.
31 32
Gacula, M., & Rutenbeck, S. (2006). Sample size in consumer test and descriptive
33
analysis. Journal of Sensory Studies, 21, 129-145.
34 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
1
Garcia-Carpintero, E.G., Gomez-Gallego, M.A., Sanchez-Palomo, E., & Gonzales
2
Viñas, M.A. (2011). Sensory descriptive analysis of Bobal red wines treated with oak
3
chips at different stages of winemaking. Australian Journal of Grape Wine Research,
4
17, 368-377.
5 6
Heymann, H., Machado, B., Torri, L, & Robinson, A.L. (2012). How many judges
7
should one use for sensory descriptive analysis? Journal of Sensory Studies, 27, 111-
8
122.
9 10
Hough, G., Wakeling, I., Mucci, A., Chambers, E., Gallardo, I.M., & Alves, L.R.
11
(2006). Number of consumers necessary for sensory acceptability tests. Food Quality
12
and Preference, 17, 522-526.
13 14
Hooge, S., & Chambers, D. (2010). A comparison of food basic taste modalities, using a
15
descriptive analysis technique, for varying levels of sodium and KCl in two model soup
16
systems. Journal of Sensory Studies, 25, 521-535.
17 18
Ishii, R., Kawaguchi, H., O’Mahony, M., & Rousseau, B. (2007) Relating consumer and
19
trained panels discriminative sensitivities using vanilla flavored ice cream as a médium.
20
Food Quality and Preference, 18, 89-96.
21 22
Jesionka, V., et al. Transitioning from proportion of discriminators to a more
23
meaningful measure of sensory difference. Food Quality and Preference (2013),
24
http://dx.doi.org/10.1016/j.foodqual.2013.04.007
25 26
King, B.M., Arents, P., & Moreau, N. (1995). Cost / Efficiency evaluation of
27
descriptive analysis panels – I. Panel size. Journal of Sensory Studies, 6, 245-261.
28 29
Kraemer, H.C., & Thiemann, S. (1987). How many subjects? Statistical power analysis
30
in research. Newbury Park: Sage Publications, pp.37-22.
31 32
Lee, J., & Chambers, D.H (2010). Descriptive analysis and US consumer acceptability
33
of 6 green tea samples from China, Japan, Korea. Journal of Food Science, 75, S141-
34
S147. 11
1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
2
Leighton, C.S., Schonefeldt, H.C., & Kruger, R. (2010). Quantitative descriptive
3
sensory analysis of Five different cultivars of sweet potato to determine sensory na
4
textural profiles. Journal of Sensory Studies, 25, 2-18.
5 6
Libertino, L.M., Osornio, M.M.L., & Hough, G. (2011). Number of consumers
7
necessary for survival analysis estimations based on each consumer evaluating a single
8
sample. Food Quality and Preference, 22, 24-30.
9 10
Mammasse, N., & Schlich, P. Adequate number of consumers in a liking test. Insights
11
from resampling in seven studies. Food Quality and Preference (2012), doi:
12
10.1016/j.foodqual.2012.01.009
13 14
Meilgaard, M.C, Civille, G.V., & Carr, B.T. (2006). Sensory Evaluation Techniques.
15
(4th ed.). Boca Raton: CRC Press.
16 17
Montgomery, D.C. (2001). Design and analysis of experiments. (5th Ed.). New York:
18
John Wiley and Sons. p.699.
19 20
Moussaoui, K.A., & Varela, P. (2010). Exploring consumer product profiling
21
techniques and their linkage to a quantitative descriptive analysis. Food Quality and
22
Preference, 21, 1088-1099.
23 24
Murray, J.M., Delahunty, C.M., & Baxter, I.A. (2001). Descriptive sensory analysis:
25
past, present and future. Food Research International, 34, 461-471.
26 27
Pages, J., & Périnel, E. (2003). Panel performance and number of evaluations in a
28
descriptive sensory study. Journal of Sensory Studies, 19, 273-291.
29 30
Silva, R. C. S. N., Minim, V. P. R., Simiqueli, A. A., Moraes, L. E. S., Gomide, A. I., &
31
Minim, L. A. (2012). Optimized Descriptive Profile: a rapid methodology for sensory
32
description. Food Quality and Preference, 24, 190-200.
33
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
1
Silva, R.C.S.N., Minim, V.P.R., Carneiro, J.D.S., Nascimento, M., Della Lucia, S.M.,
2
Minim, L.A. (2013). Quantitative sensory description using the Optimized Descriptive
3
Profile: comparison with conventional and alternative methods for evaluation of
4
chocolate. Food Quality and Preference, 30, 169-179.
5 6
Speaziale, M., Vásuez-Araujo, L., Mincione, A., & Carbonell-Barrachina, A.A. (2010).
7
Volatile composition and descriptive sensory analysis of Italian vanilla Torrone.
8
International of Food Science and Technology, 45, 1586-1593.
9 10
Stone, H., & Sidel, J.L. (1985). Sensory evaluation practices. (1st. ed.) New York:
11
Academic.
12 13
Stone, H., & Sidel, J.L. (2004). Sensory evaluation practices. (3th. ed.) New York:
14
Academic.
15 16
Tesfaye, W., Morales, M.I., Callejon, R.M., Gonzales, A.G., Garcia-Parrila, M.C., &
17
Troncoso, A.M. (2010). Descriptive sensory analysis of wine vinegar: Tasting
18
procedure and reliability of new attributes. Journal of Sensory Studies, 25, 216-230.
19 20
Thurstone, L. (1927). A law of comparative judgment. Psychological Review, 34(4),
21
273–286.
22 23
Walpole, R.H., Myers, R. H., Myers, S.L., & Ye, K. (2011). Probability & Statistics for
24
Engineers & Scientists. (9th ed.). New York: Pearson. p.816.
25 26
Warmund, M.R., Elmore, J.R., Adhikari, K., & McGraw, S. (2011). Descriptive sensory
27
and free sugar contents of chestnut cultivars grown in North American. Journal of
28
Science of Food Agriculture, 91, 1940-1945.
29 30
Table captions
31 32
Table 1 – Decision errors: Type I and Type II.
33
13
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
1
Table 2 – Values of the Root Mean Square Error (RMSE) encountered in literature
2
considering different countries, types of foods, sensory attributes and the number of
3
judges.
4 5
Table 3 – Number of evaluations needed for descriptive sensory testing considering the
6
probabilities of type I (α) and type II (β) errors and root mean square error length
7
(RMSEL), for d’=0.20.
8 9
Table 4 – Number of evaluations needed for descriptive sensory testing considering the
10
probabilities of type I (α) and type II (β) errors and root mean square error length
11
(RMSEL), for d’=0.10.
12 13
Table 5 – Number of evaluations needed for descriptive sensory testing considering the
14
probabilities of type I (α) and type II (β) errors and root mean square error length
15
(RMSEL), for d’=0.05.
16 17
Table 6 – Number of evaluations needed for descriptive sensory testing considering the
18
probabilities of type I (α) and type II (β) errors and root mean square error length
19
(RMSEL), for d’=0.02.
20 21
Figures captions
22 23
Figure 1 – Representation of d’ in descriptive tests (a) and in the Thurstonian model
24
(b).
25
Figure 2 – Histogram of the RMSEL data from literature and Weibull probability
26
distribution fitted to the data.
14
Table 1
H0 true Truth H0 false
Decision Reject H0 Do not reject H0 Type I error Correct decision (probability α) Type II error Correct decision (probability β)
Table 2
Number Country
Products
of Judges
Number of
RMSE
RMSEL
measurements
m ± sd
m ± sd
1.1739 ±
0.0784 ±
0.8053
0.0537
1.4123 ±
0.0942 ±
0.4160
0.0277
1.4507 ±
0.1220 ±
0.5864
0.0591
0.9607 ±
0.0640 ±
0.1760
0.0117
0.7475 ±
0.0831 ±
0.2280
0.0253
1.5393 ±
0.1497 ±
0.6655
0.0588
ice cream, juice, water Norway
solutions,
9 - 12
256
11 - 17
12
8 - 11
54
7 - 11
12
10 -12
60
6 - 18
180
tomatoes soup, cold tea. USA
France
Spain
Italy
beer coffee, drinks, wine ham tender, water solutions. Cheese, chocolate,
Brazil
fish, juice, yogurt, taro, cachaça, flan, salami.
RMSE: Root Mean Square Error; RMSEL: Root Mean Square Error Length; m: mean; sd: standard deviation
Table 3
α
RMSEL 0.0329 0.0573 0.0932 0.1368 0.1811
β = 0.20 0.10
2
3
4
7
11
0.05
2
3
5
9
14
0.01
3
4
7
13
21
0.10
2
3
5
9
15
0.05
3
4
6
8
18
0.01
3
5
9
16
27
0.10
2
3
6
11
19
0.05
3
4
7
14
23
0.01
4
5
10
19
31
β = 0.10
β = 0.05
Table 4
α
RMSEL 0.0329 0.0573 0.0932 0.1368 0.1811
β = 0.20 0.10
3
5
12
24
42
0.05
4
7
15
32
53
0.01
5
10
23
46
79
0.10
3
7
16
33
57
0.05
4
8
20
41
70
0.01
6
12
28
58
100
0.10
4
8
20
42
72
0.05
5
10
24
50
87
0.01
6
14
33
69
119
β = 0.10
β = 0.05
Table 5
α
RMSEL 0.0329 0.0573 0.0932 0.1368 0.1811
β = 0.20 0.10
7
17
44
94
163
0.05
8
22
56
118
207
0.01
12
33
83
117
308
0.10
9
24
61
129
226
0.05
11
29
74
159
277
0.01
15
41
106
225
393
0.10
11
30
76
163
285
0.05
13
36
92
192
342
0.01
18
49
126
268
470
β = 0.10
β = 0.05
Table 6
α
RMSEL 0.0329 0.0573 0.0932 0.1368 0.1811
β = 0.20 0.10
35
103
270
580
1015
0.05
44
130
342
736
1289
0.01
65
194
509
1095
1917
0.10
48
142
373
802
1406
0.05
58
174
458
985
1725
0.01
83
246
648
1395
2442
0.10
60
179
471
1014
1775
0.05
75
215
566
1217
2132
0.01
99
295
776
1669
2932
β = 0.10
β = 0.05
Figure 1
Figure 2
Highlights Ideal number of judges for descriptive evaluations was calculated by “Power analysis and Sample size” procedure.
Previous researches in the sensory descriptions were utilized.
The RMSE from 574 previous researches were collected.
Three different levels of Tipe I and Type II errors were considered.
Were simulated 180 scenarios of descriptive tests.