Number of judges necessary for descriptive sensory tests

Number of judges necessary for descriptive sensory tests

Accepted Manuscript Number of judges necessary for descriptive sensory tests Rita de Cássia dos Santos Navarro da Silva, Valéria Paula Rodrigues Minim...

712KB Sizes 0 Downloads 47 Views

Accepted Manuscript Number of judges necessary for descriptive sensory tests Rita de Cássia dos Santos Navarro da Silva, Valéria Paula Rodrigues Minim, Alexandre Navarro da Silva, Luis Antônio Minim PII: DOI: Reference:

S0950-3293(13)00122-5 http://dx.doi.org/10.1016/j.foodqual.2013.07.010 FQAP 2666

To appear in:

Food Quality and Preference

Received Date: Revised Date: Accepted Date:

27 May 2013 18 July 2013 20 July 2013

Please cite this article as: Silva, R.d.C., Minim, r.P.R., Silva, A.N.d., Minim, L.A., Number of judges necessary for descriptive sensory tests, Food Quality and Preference (2013), doi: http://dx.doi.org/10.1016/j.foodqual. 2013.07.010

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Number of judges necessary for descriptive sensory tests

2 3

Rita de Cássia dos Santos Navarro da Silva1, Valéria Paula Rodrigues Minim1,

4

Alexandre Navarro da Silva2, Luis Antônio Minim1.

5 6

1

7

Zip code: 36570-000, Viçosa, Minas Gerais, Brasil.

8

2

9

Viçosa (UFV), Zip code: 36570-000, Viçosa, Minas Gerais, Brasil.

Departamento de Tecnologia de Alimentos, Universidade Federal de Viçosa (UFV),

Departamento de Engenharia de Produção e Mecânica, Universidade Federal de

10 11

E-mail addresses: [email protected] (SILVA, R.C.S.N.), [email protected] (MINIM,

12

V.P.R), [email protected] (SILVA, A.N.) , [email protected] (MINIM, L.A.).

13 14

ABSTRACT

15

To determine the number of judges needed in descriptive tests four parameters are

16

necessary: the probability of type I error (α), the probability of type II error (β), the

17

difference in averages which is sought in the experiment (d’) and standard deviation of

18

the experimental error(s). Probabilities of the experimental errors and the difference that

19

one desires to detect between means should be stipulated by the researcher. Therefore,

20

only the estimate of the experimental error cannot be previously obtained, which must

21

be obtained experimentally or by means of similar researches previously performed.

22

Because in descriptive sensory analysis the most common approach to data analysis is

23

the analysis of variance, the estimated standard deviation of the experimental error is

24

obtained by the root mean square error (RMSE). Therefore, 574 RMSE values were

25

obtained from previous published studies. The data collected was adjusted to Weibull

26

probability distribution (1.8081, 0.11419), where five percentiles of the distribution are

27

considered in the calculations. Determination of the number of evaluations necessary

28

was performed using the procedure “sample size and power analysis” of the JMP/SAS

29

software. Three probability levels were defined for type I and II errors, four levels of

30

mean difference to be detected in the experiment, and five percentiles of RMSE

31

distribution probabilities. The required numbers of evaluations in descriptive tests,

32

considering these different experimental conditions, were calculated totaling 180

33

scenarios. Considering the median of the experimental error, a value of alpha (Type I 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

1

error) of 1%, a value of beta (Type II error) of 5% and a difference between the average

2

of 10% of the sensory scale, 33 evaluations are needed in the descriptive tests. Further

3

considering that each judge evaluates the samples triplicate, 11 judges are necessary for

4

this specific set of parameters. Other scenarios were also discussed in the paper.

5

Key-words: sample size; power analysis; panel size; RMSE.

6 7 8 9

1. Introduction The descriptive sensory analysis of foods consists of the assessment of sensory characteristics by a team of judges, who identify and quantify the intensity of sensory

10

stimuli present in food using the five human senses (sight, smell, hearing, touch and

11

taste) (Murray, Delahunty and Baxter, 2001). Descriptive assessment is a valuable tool

12

in the various stages of food processing: development of new products, quality control,

13

storage and shelf-life (Stone and Sidel, 2004; Meilgaard et al., 2006).

14

Traditional descriptive evaluation techniques require the teaming of judges with

15

a high degree of training, and the high number of judges may be an obstacle to the

16

application of this valuable tool in industry (Heymann et al., 2012). A recommended

17

ideal number of judges to make up a team is not clear from literature. Different

18

recommendations are found depending on the technique used, for example, six judges

19

for the Flavor Profile (Cairncross & Sjostrom, 1950); ten judges for the Texture Profile

20

(Brandt, Skinner, & Coleman, 1963); and ten to twelve judges for the Quantitative

21

Descriptive Analysis (Stone & Sidel, 1985). However, criteria for determining the

22

number of judges needed are not shown.

23

On the other hand, teams composed of different sizes are verified in the generic

24

techniques called “Conventional Profile” or “Descriptive Analysis”. In published

25

studies which utilize generic techniques, there is the use of six judges (Lee and

26

Chambers, 2010; Warmund et al., 2011), seven judges (Guàrdia et al., 2010;

27

Chueamchaitrakum et al., 2011), eight judges (Perrin & Pagès, 2009; Anyango et al.,

28

2011), nine judges (Westad et al., 2003; Campo et al., 2010; Tesfaye et al., 2010; Silva

29

et al., 2012), ten judges (Leighton et al., 2010; Speziale et al., 2010; Silva et al., 2013),

30

eleven judges (Bitnes et al., 2009; Plaehn, 2009; Moussaoui and Varela, 2010; Parr et

31

al., 2010), twelve judges (Sinesio et al., 2009; Bitnes et al., 2009; Biasoto et al., 2010),

32

fifteen judges (Garcia-Carpintero et al., 2011), seventeen judges (Guinard et al., 1999)

33

and twenty judges (Delgado and Guinard, 2011). In general, the lower limit is six judges

34

and the upper limit is twenty judges. 2

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Since the cost associated with sensory evaluation increases with the number of

2

judges participating, it is important to determine the optimal number of evaluators

3

necessary for sensory tests. According to Heymann et al. (2012), it is obvious that

4

training a smaller number of judges requires less time, money and effort, but this may

5

result in a “false savings” due to the possibility of obtaining “poor” data. Thus, the

6

challenge is to determine the optimal number of judges needed in descriptive

7

assessments that allows for reducing the size of the team, but without losses of

8

information on the sensory profile of foods, the description of products and still permit

9

for performing powerful statistical testing.

10 11 12

2. How to determine the ideal number of judges? Calculating the number of judges for sensory testing has been little explored in

13

literature. Some recent studies have been conducted for affective tests, in which the

14

ideal number of consumers was determined for sensory acceptance tests. In these

15

studies, the authors calculated the size of the sample using data obtained experimentally

16

(Gacula and Rutenbeck, 2006; Mammasse and Schlich, 2013), with data obtained from

17

a literature review (Hough et al., 2006) and by survival analysis (Hough et al., 2007;

18

Libertino, Osornio, Hough, 2011). For descriptive tests, re-sampling techniques with

19

experimental data were used by King et al. (1995), Pages and Perinel (2003), Gacula

20

and Rutenbeck (2006) and Heymann et al. (2013).

21

In calculating the number of judges, four parameters must be known, using the

22

concept “sample size and power of analysis”, described in Kraemer and Thiemann

23

(1987), Montgomery (2001) and Walpole et al. (2011), which are: (i) level α –

24

probability of type I error, (ii) level β – probability of type II error, also expressed as

25

power of the test (1 – β), (iii) d’ – difference in average which is sought in the

26

experiment, and (iv) s - standard deviation of the experimental error. The probabilities

27

of experimental errors (α and β) and the difference that one desires to detect between

28

means shall be stipulated by the researcher. Therefore, only the experimental error

29

cannot be previously obtained, which must be obtained experimentally or by means of

30

similar operations previously performed.

31 32 33 34

3. Identifying the parameters Knowledge of the definition and interpretation of the four parameters which must be known to calculate the number of judges is of utmost importance in 3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

1

determining the required number of evaluations in descriptive tests. In order to establish

2

the criteria and method for obtaining these parameters, a brief explanation of these

3

concepts will be presented.

4 5 6

3.1. Probability of Type I and Type II error The decision errors denominated Type I and Type II are associated with the

7

hypotheses of the statistical test used to verify the existence of significant differences

8

between treatment means. The null hypothesis (H0) represents equality between the

9

means, which is tested against the alternative hypothesis (Ha) that opposes the decision

10

that at least one mean differs from the others (Montgmonery, 2001). When samples are

11

used and decisions are made based on frequency distributions, there is the probability of

12

committing errors in the decisions. Thus, when detected in a statistical test that the

13

treatment means are different, there is a probability that they are equal considering the

14

population, which characterizes the Type I decision error, with a probability of

15

occurrence α. Similarly, when detecting that the treatment means are equal, this

16

conclusion is also subject to a decision error, which is termed as Type II error with

17

probability of occurrence β. Table 1 illustrates the decision errors associated with the

18

statistical tests.

19

The probabilities of committing Type I and Type II errors should be established

20

prior to conducting the experiment (Stone & Sidel, 2004). These probabilities are also

21

used in the calculations of test power and sample size (Montgomery, 2001; Walpole et

22

al., 2011), according to the research objective.

23 24

3.2. Difference in average (d’)

25

The value of d’ determines the magnitude of difference that must exist between

26

the treatment means for detecting significance between the same, for given values of α

27

and β (Hough et al., 2006). This magnitude is given in percent of the scale size. Thus, if

28

using a scale of 9 cm and a d’ value of 0.02 is stipulated, the treatment means must

29

present a minimum difference 0.18 cm (2% of 9 cm) to be considered different from

30

each other (Figure 1a). For instance, in this case d' value is 2% on a 9 cm scale (i.e. the

31

minimum difference is of 0.18cm), does this mean that if the product A is in average at

32

6.0 and the product B is in average at 6.3, A and B are significantly different since the

33

difference between them is 0.3cm (> 0.18cm defined).

4

To avoid misinterpretation regarding the parameter d’ used in this work and the

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

2

value of delta from the Thurstonian model, this theory will be quickly addressed by

3

making a comparison. In difference sensory tests the Thurstonian model is used to

4

estimate the distance (δ) between the intensity means (μX and μY) of the stimulus of two

5

products, which is given by the number of standard deviations by which the two

6

distributions are separated (Thurstone, 1927), Figure 1b. The value of d’ in this case is

7

the estimate of the parameter δ. In the Thurstonian theory, it is assumed that the

8

perceived intensity of the sensory stimuli follows a normal distribution and that both

9

products (X and Y) exhibit the same standard deviation (Ennis, 1993; Ishii et al., 2007;

10

Jesionka et al., 2013). Thus, it is clear that the value of d’ of the Thurstonian model is a measure of the

11 12

number of standard deviations, which strictly depends on the distribution and variance

13

of the perception of sensory stimuli. At the same time, the value of d’ assumed in this

14

study refers to a percentage value of the scale, i.e., sensory score. In this case no

15

assumption of variance is required.

16

A value of d’ equal to 1 in the Thurstonian theory (comparison tests) represents

17

an equal distance between the means equal to one standard deviation and therefore the

18

two stimuli may be confused (Meilgaard, Civille & Carr, 2006). On the other hand, in

19

descriptive tests the value of d’ equal to 1 indicates that the means should present 100%

20

sensory difference in relation to the scale size, so that significant difference is detected.

21

Thus, a treatment is anchored at the extremity “weak” on the scale and the other

22

treatment anchored at the extremity “strong”, therefore facilitating discrimination.

23

The concept of the value d’ of this study differs from the concept presented by the

24

Thurstonian theory and follows the same description as proposed by Hough et al.

25

(2006).

26 27

3.3. Estimative of standard deviation of the experimental error (Root Mean Square

28

Error - RMSE)

29

In the descriptive sensory analysis the most common approach to data analysis is

30

the analysis of variance (ANOVA), a with an estimated standard deviation of the

31

experimental error obtained by the root mean square error (RMSE). The ANOVA

32

model adopted in this study was the model with only one source of variation (Atributte

33

= media + Product + error).

5

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Calculation of the RMSE depends on the experimental data collection, and

2

therefore it is impossible to obtain prior to the tests. Because determining the required

3

number of evaluations must be conducted prior to conducting the tests, an estimate of

4

this value is required. This estimate can be obtained from literature review studies

5

considering similar work (same type of product, same methodology) or by means of

6

previous experience of the researcher in relation to previous tests performed with the

7

same team or on similar products. Alternatively, preliminary individual evaluation tests

8

on performance of the judges may also be useful in obtaining an expected value for the

9

RMSE.

10

Determination of a RMSE value for subsequent calculation of the number of

11

evaluations is subject to a decision error, thus a safety range is recommended which

12

may even account for dropouts. In any case, it is recommended to always use a greater

13

RMSE value than expected, which permits that the researcher work with a range of

14

security in determining the required number of evaluators.

15

To obtain an estimate of the standard deviation of experimental errors, RMSE

16

values were collected from previous sensorial characterization studies performed using

17

generic methodologies (Conventional Profile or Descriptive Analysis). Thus, we

18

collected 574 RMSE values from 34 previous studies published in the journals Food

19

Quality and Preference and Journal of Sensory Studies in years between 1993 and

20

2012. The Table 2 presents data of the root mean square error obtained from different

21

food descriptive analysis studies, considering different countries, foods, unstructured

22

scale sizes and numbers of judges in the team. The demographics of the judges and the

23

types of food products were sufficiently varied to cover different situations involving

24

descriptive tests.

25

Because data was collected from studies using generic methods, different sized

26

scales may have been used. To standardize values, the root mean square error of each

27

measurement was divided by the length of the scale, obtaining the RMSEL (Root Mean

28

Square Error Length), as recommended by Hough et al. (2006). Thus, if an RMSE of

29

1.2 was obtained using a scale of 15 cm, the value of the RMSEL is 1.2/15 = 0.08. In

30

the calculation of the analysis of variance, if a data set is divided by a number, the result

31

of the root mean square error will also be divided by the same number. Therefore,

32

dividing the RMSE by the size of the scale is equivalent to having previously

33

standardized the length of all rating scales for a range of 0-1.

6

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Hough et al. (2006) calculated the number of consumers necessary for

2

acceptance tests using data collected in literature. In the study, the authors assumed that

3

the values of RMSEL presented symmetric distribution and they therefore performed

4

calculations of sample size considering the mean and standard deviation of RMSEL

5

values obtained from literature. The observation of the RMSEL occurrence frequency in

6

previous studies allows us to infer a probabilistic model that represents these data and

7

thereby gain greater insight into this parameter. Adjustment of a known probability

8

distribution may allow for more realistic calculation of the number of panelists, taking

9

into account the probability of experimentally obtaining a value of RMSEL for a

10

determined magnitude.

11

The present study proposed the calculation of the number of judges for

12

descriptive tests by means of the procedure “sample size and power of analysis”, using

13

the probability distribution of the experimental error estimates, obtained in previous

14

studies in published literature.

15 16

4. Calculation of number of judges in descriptive tests

17

With known RMSEL values obtained from literature (Table 2), the existence of

18

nine outliers was observed using the Box-plot technique, considering the upper limit as

19

three times the interquartile distance above the third quartile. There were no lower

20

outliers, since the distribution of values presented a positive skewness of 0449 and only

21

positive values. Adjusting of a probability distribution to the data was performed using

22

the Kolmogorov-Smirnov adherence test (p > 0.10), adjusted to the Weibull distribution

23

(1.8081, 0.11419), represented in Figure 2.

24

In calculating the number of judges, five percentiles of the probability

25

distribution were considered (10%, 25%, 50%, 75% and 90%). Thus, the RMSEL

26

values used were: 0.0329, 0.0573, 0.0932, 0.1368 and 0.1811.

27

Determination of the optimal number of judges in descriptive tests was

28

performed using the “sample size and power analysis” procedure of the software

29

JMP/SAS. The number of evaluations necessary was also determined considering the

30

five RMSEL values described above, three probability levels of type I error (α = 0.10,

31

0.05, 0.01), three probability levels of type II error (β = 0.20, 0.10, 0.05), and four

32

difference levels to be detected between the experimental means (d’= 0.20, 0.10, 0.05,

33

0.02).

7

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

The Tables 3 to 6 show the number of evaluations (NA) needed in the descriptive

2

tests, considering the four different levels of d’. Thus, for d’=0.20 (Table 3), d’=0.10

3

(Table 4), d’=0.05 (Table 5) and d’=0.02 (Table 6). Each of the components of the

4

equation offers a contribution to determine the number of evaluations, but the variation

5

of some components has drastically different effects. A greater effect was observed in

6

the number of evaluations for the components of difference sought in the experiment

7

(d’) and the standard deviation of the experimental error (RMSEL). The number of

8

evaluations necessary in the descriptive tests increases with the RMSEL value and

9

decreases with values of alpha, beta and d’.

10

The recommendation of the number of evaluations in the descriptive tests varied

11

significantly depending on the parameters considered, where 2 to 2932 measurements

12

were observed. In both cases it would be ideal to work with panels that are not

13

extremely small or extremely large. The recommendation of an extremely high number

14

of evaluations leads us to believe that the experimental conditions of these scenarios are

15

not well defined, which are associated with high probabilities of decision errors and

16

high random variability (RMSE), therefore not recommended. Regarding very small

17

panels, previous studies for determining the number of judges in descriptive tests,

18

carried out by computer simulation with experimental data, also found that few judges

19

may be sufficient for descriptive assessments, where it was verified the need for only

20

two judges (Pages & Perinel, 2003), five judges (Gacula & Rutenbeck, 2006) and eight

21

judges (Heymann et al., 2013).

22

It is important to note that in the descriptive tests several RMSE measurements

23

are obtained and not one single measurement. The largest expected value should thus be

24

considered for determining the optimal number of assessments. Therefore, the number

25

of evaluations will be determined based on the most critical attribute, i.e., that with the

26

highest variance random. The number of judges participating in the team should be

27

large enough to allow that the most critical attribute is correctly evaluated.

28

Because in tables (Tables 3 to 6) shows the number of evaluations needed (NA),

29

the number of replications (r) of the assessments made by each judge must also be

30

considered. Thus, if the judge evaluates each sample only once (r = 1), the number of

31

judges (n) is equal to the number of evaluations (NA). However, if each judge evaluates

32

each sample more than once (r > 1), as commonly used in descriptive sensory tests, the

33

number of judges is given by the ratio between the number of evaluations needed and

34

the number of replications (N = NA/r). 8

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

For example, supposing the expected criteria stipulated by the researcher in his

2

experiment as α = 0.01, β = 0.05, d’= 0.10 and RMSEL = 0.0932 (median), the number

3

of evaluations required is equal to 33. Considering that each judge performs three

4

replications (r) of the sensory evaluation, the ideal number of judges is 33/3 = 11

5

judges. It is also important to note that the repetitions should be performed with a

6

sufficient time interval so that the judge does not remember the previous test.

7

The range of values reported in Tables 3 to 6 is limited to what the authors

8

consider necessary for most practical applications. For example, for a very trained team

9

it may be necessary to consider a value of d’ less than 0.02. In this case, the researcher

10

may use a statistical package to calculate the number of evaluations (NA) needed for the

11

descriptive test. On the other hand, values of RMSEL can also be altered according to

12

the probability of obtaining an experimental variance considered by the investigator. At

13

this point, fitting of a probability distribution of values from literature proves very

14

useful because the researcher can obtain other RMSEL values from the Weibull

15

distribution (1.8081, 0.11419). To determine the number of evaluations needed in other

16

experimental conditions, the procedure “sample size and power analysis” of the

17

JMP/SAS software may be used.

18 19 20

5. Conclusion The collection of experimental error estimates from studies on the sensory

21

description of food performed in literature allows for calculating the number of judges

22

needed for descriptive evaluations. Determining the ideal number of judges in sensory

23

testing is extremely important because the smaller the team size, the lower the costs for

24

performing the sensory tests. On the other hand, this calculation should be done

25

carefully so that information on the sensory profile of the product is not obscured and so

26

that robust and powerful statistical tests can be applied.

27 28 29 30

Acknowledgements The authors would like to acknowledge the CNPq and Fapemig for their financial support.

31 32

References

9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

1

Anyango, J.O., De Kock, H.L., & Taylor, J.R.N. (2011). Evaluation of the fuctional

2

quality of cowpea-fortified traditional African sorghum foods using istrumental and

3

descriptive sensory analysis. LWT – Food Science and Tecnhonology, 44, 2126-2133.

4 5

Biasoto, A.C.T., Catharino, R.R., Sanvido, G.B., Eberlin, M.N., Aparecida, M., & Da

6

Silva, A.P. (2010). Flavour characterization of red wines by descriptive analysis and

7

ESI mass spectrometry. Food Quality and Preference, 21, 755-762.

8 9 10

Brandt, M. A., Skinner, E. Z., & Coleman, J. A. (1963). Texture profile method. Journal of Food Science, 28, 404–409.

11 12

Cairncross, S. E., & Sjostrom, L. B. (1950). Flavour profiles: a new approach to flavor

13

problems. Food Technology, 4, 308–311.

14 15

Campo, E., Ballester, J., Langlois, J., Dacremont, C., & Valentin, D. (2010).

16

Comparison of conventional descriptive analysis and a citation frequency-based

17

descriptive method for odor profiling: An application to Burgundy Pinot noir wines.

18

Food Quality and Preference, 21, 755-762.

19 20

Chueamchaitrakun, P., Chompreeda, P., Haruthaithanasan, V., Suwonsichon, T.,

21

Kaswmsamran, S., & Prinyawiwatkui, W. (2011). Sensory descriptive and texture

22

profile analyses of butter cakes made from composite rice flours. International of

23

Journal of Food Science and Techology, 45, 2358-2365.

24 25

Delgado, C., & Guinard, J.X. (2011). How do consumer hedonic ratings for extra olive

26

oil relate to quality ratings by experts and descriptive analysis ratings? Food Quality

27

and Preference, 22, 213-322.

28 29

Ennis, D. (1993). The power of sensory discrimination methods. Journal of Sensory

30

Studies, 8(4), 353–370.

31 32

Gacula, M., & Rutenbeck, S. (2006). Sample size in consumer test and descriptive

33

analysis. Journal of Sensory Studies, 21, 129-145.

34 10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

1

Garcia-Carpintero, E.G., Gomez-Gallego, M.A., Sanchez-Palomo, E., & Gonzales

2

Viñas, M.A. (2011). Sensory descriptive analysis of Bobal red wines treated with oak

3

chips at different stages of winemaking. Australian Journal of Grape Wine Research,

4

17, 368-377.

5 6

Heymann, H., Machado, B., Torri, L, & Robinson, A.L. (2012). How many judges

7

should one use for sensory descriptive analysis? Journal of Sensory Studies, 27, 111-

8

122.

9 10

Hough, G., Wakeling, I., Mucci, A., Chambers, E., Gallardo, I.M., & Alves, L.R.

11

(2006). Number of consumers necessary for sensory acceptability tests. Food Quality

12

and Preference, 17, 522-526.

13 14

Hooge, S., & Chambers, D. (2010). A comparison of food basic taste modalities, using a

15

descriptive analysis technique, for varying levels of sodium and KCl in two model soup

16

systems. Journal of Sensory Studies, 25, 521-535.

17 18

Ishii, R., Kawaguchi, H., O’Mahony, M., & Rousseau, B. (2007) Relating consumer and

19

trained panels discriminative sensitivities using vanilla flavored ice cream as a médium.

20

Food Quality and Preference, 18, 89-96.

21 22

Jesionka, V., et al. Transitioning from proportion of discriminators to a more

23

meaningful measure of sensory difference. Food Quality and Preference (2013),

24

http://dx.doi.org/10.1016/j.foodqual.2013.04.007

25 26

King, B.M., Arents, P., & Moreau, N. (1995). Cost / Efficiency evaluation of

27

descriptive analysis panels – I. Panel size. Journal of Sensory Studies, 6, 245-261.

28 29

Kraemer, H.C., & Thiemann, S. (1987). How many subjects? Statistical power analysis

30

in research. Newbury Park: Sage Publications, pp.37-22.

31 32

Lee, J., & Chambers, D.H (2010). Descriptive analysis and US consumer acceptability

33

of 6 green tea samples from China, Japan, Korea. Journal of Food Science, 75, S141-

34

S147. 11

1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

2

Leighton, C.S., Schonefeldt, H.C., & Kruger, R. (2010). Quantitative descriptive

3

sensory analysis of Five different cultivars of sweet potato to determine sensory na

4

textural profiles. Journal of Sensory Studies, 25, 2-18.

5 6

Libertino, L.M., Osornio, M.M.L., & Hough, G. (2011). Number of consumers

7

necessary for survival analysis estimations based on each consumer evaluating a single

8

sample. Food Quality and Preference, 22, 24-30.

9 10

Mammasse, N., & Schlich, P. Adequate number of consumers in a liking test. Insights

11

from resampling in seven studies. Food Quality and Preference (2012), doi:

12

10.1016/j.foodqual.2012.01.009

13 14

Meilgaard, M.C, Civille, G.V., & Carr, B.T. (2006). Sensory Evaluation Techniques.

15

(4th ed.). Boca Raton: CRC Press.

16 17

Montgomery, D.C. (2001). Design and analysis of experiments. (5th Ed.). New York:

18

John Wiley and Sons. p.699.

19 20

Moussaoui, K.A., & Varela, P. (2010). Exploring consumer product profiling

21

techniques and their linkage to a quantitative descriptive analysis. Food Quality and

22

Preference, 21, 1088-1099.

23 24

Murray, J.M., Delahunty, C.M., & Baxter, I.A. (2001). Descriptive sensory analysis:

25

past, present and future. Food Research International, 34, 461-471.

26 27

Pages, J., & Périnel, E. (2003). Panel performance and number of evaluations in a

28

descriptive sensory study. Journal of Sensory Studies, 19, 273-291.

29 30

Silva, R. C. S. N., Minim, V. P. R., Simiqueli, A. A., Moraes, L. E. S., Gomide, A. I., &

31

Minim, L. A. (2012). Optimized Descriptive Profile: a rapid methodology for sensory

32

description. Food Quality and Preference, 24, 190-200.

33

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

1

Silva, R.C.S.N., Minim, V.P.R., Carneiro, J.D.S., Nascimento, M., Della Lucia, S.M.,

2

Minim, L.A. (2013). Quantitative sensory description using the Optimized Descriptive

3

Profile: comparison with conventional and alternative methods for evaluation of

4

chocolate. Food Quality and Preference, 30, 169-179.

5 6

Speaziale, M., Vásuez-Araujo, L., Mincione, A., & Carbonell-Barrachina, A.A. (2010).

7

Volatile composition and descriptive sensory analysis of Italian vanilla Torrone.

8

International of Food Science and Technology, 45, 1586-1593.

9 10

Stone, H., & Sidel, J.L. (1985). Sensory evaluation practices. (1st. ed.) New York:

11

Academic.

12 13

Stone, H., & Sidel, J.L. (2004). Sensory evaluation practices. (3th. ed.) New York:

14

Academic.

15 16

Tesfaye, W., Morales, M.I., Callejon, R.M., Gonzales, A.G., Garcia-Parrila, M.C., &

17

Troncoso, A.M. (2010). Descriptive sensory analysis of wine vinegar: Tasting

18

procedure and reliability of new attributes. Journal of Sensory Studies, 25, 216-230.

19 20

Thurstone, L. (1927). A law of comparative judgment. Psychological Review, 34(4),

21

273–286.

22 23

Walpole, R.H., Myers, R. H., Myers, S.L., & Ye, K. (2011). Probability & Statistics for

24

Engineers & Scientists. (9th ed.). New York: Pearson. p.816.

25 26

Warmund, M.R., Elmore, J.R., Adhikari, K., & McGraw, S. (2011). Descriptive sensory

27

and free sugar contents of chestnut cultivars grown in North American. Journal of

28

Science of Food Agriculture, 91, 1940-1945.

29 30

Table captions

31 32

Table 1 – Decision errors: Type I and Type II.

33

13

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

1

Table 2 – Values of the Root Mean Square Error (RMSE) encountered in literature

2

considering different countries, types of foods, sensory attributes and the number of

3

judges.

4 5

Table 3 – Number of evaluations needed for descriptive sensory testing considering the

6

probabilities of type I (α) and type II (β) errors and root mean square error length

7

(RMSEL), for d’=0.20.

8 9

Table 4 – Number of evaluations needed for descriptive sensory testing considering the

10

probabilities of type I (α) and type II (β) errors and root mean square error length

11

(RMSEL), for d’=0.10.

12 13

Table 5 – Number of evaluations needed for descriptive sensory testing considering the

14

probabilities of type I (α) and type II (β) errors and root mean square error length

15

(RMSEL), for d’=0.05.

16 17

Table 6 – Number of evaluations needed for descriptive sensory testing considering the

18

probabilities of type I (α) and type II (β) errors and root mean square error length

19

(RMSEL), for d’=0.02.

20 21

Figures captions

22 23

Figure 1 – Representation of d’ in descriptive tests (a) and in the Thurstonian model

24

(b).

25

Figure 2 – Histogram of the RMSEL data from literature and Weibull probability

26

distribution fitted to the data.

14

Table 1

H0 true Truth H0 false

Decision Reject H0 Do not reject H0 Type I error Correct decision (probability α) Type II error Correct decision (probability β)

Table 2

Number Country

Products

of Judges

Number of

RMSE

RMSEL

measurements

m ± sd

m ± sd

1.1739 ±

0.0784 ±

0.8053

0.0537

1.4123 ±

0.0942 ±

0.4160

0.0277

1.4507 ±

0.1220 ±

0.5864

0.0591

0.9607 ±

0.0640 ±

0.1760

0.0117

0.7475 ±

0.0831 ±

0.2280

0.0253

1.5393 ±

0.1497 ±

0.6655

0.0588

ice cream, juice, water Norway

solutions,

9 - 12

256

11 - 17

12

8 - 11

54

7 - 11

12

10 -12

60

6 - 18

180

tomatoes soup, cold tea. USA

France

Spain

Italy

beer coffee, drinks, wine ham tender, water solutions. Cheese, chocolate,

Brazil

fish, juice, yogurt, taro, cachaça, flan, salami.

RMSE: Root Mean Square Error; RMSEL: Root Mean Square Error Length; m: mean; sd: standard deviation

Table 3

α

RMSEL 0.0329 0.0573 0.0932 0.1368 0.1811

β = 0.20 0.10

2

3

4

7

11

0.05

2

3

5

9

14

0.01

3

4

7

13

21

0.10

2

3

5

9

15

0.05

3

4

6

8

18

0.01

3

5

9

16

27

0.10

2

3

6

11

19

0.05

3

4

7

14

23

0.01

4

5

10

19

31

β = 0.10

β = 0.05

Table 4

α

RMSEL 0.0329 0.0573 0.0932 0.1368 0.1811

β = 0.20 0.10

3

5

12

24

42

0.05

4

7

15

32

53

0.01

5

10

23

46

79

0.10

3

7

16

33

57

0.05

4

8

20

41

70

0.01

6

12

28

58

100

0.10

4

8

20

42

72

0.05

5

10

24

50

87

0.01

6

14

33

69

119

β = 0.10

β = 0.05

Table 5

α

RMSEL 0.0329 0.0573 0.0932 0.1368 0.1811

β = 0.20 0.10

7

17

44

94

163

0.05

8

22

56

118

207

0.01

12

33

83

117

308

0.10

9

24

61

129

226

0.05

11

29

74

159

277

0.01

15

41

106

225

393

0.10

11

30

76

163

285

0.05

13

36

92

192

342

0.01

18

49

126

268

470

β = 0.10

β = 0.05

Table 6

α

RMSEL 0.0329 0.0573 0.0932 0.1368 0.1811

β = 0.20 0.10

35

103

270

580

1015

0.05

44

130

342

736

1289

0.01

65

194

509

1095

1917

0.10

48

142

373

802

1406

0.05

58

174

458

985

1725

0.01

83

246

648

1395

2442

0.10

60

179

471

1014

1775

0.05

75

215

566

1217

2132

0.01

99

295

776

1669

2932

β = 0.10

β = 0.05

Figure 1

Figure 2

Highlights Ideal number of judges for descriptive evaluations was calculated by “Power analysis and Sample size” procedure.

Previous researches in the sensory descriptions were utilized.

The RMSE from 574 previous researches were collected.

Three different levels of Tipe I and Type II errors were considered.

Were simulated 180 scenarios of descriptive tests.