Food Quality and Preference 18 (2007) 190–195 www.elsevier.com/locate/foodqual
Sensory difference tests: Overdispersion and warm-up Ofelia Angulo a, Hye-Seong Lee b, Michael OÕMahony b
b,*
a Instituto Tecnologico de Veracruz, M.A. de Quevedo 2779, Veracruz, Ver, Mexico Department of Food Science and Technology, University of California, Davis, CA, United States
Received 14 February 2005; received in revised form 5 July 2005; accepted 28 September 2005 Available online 8 November 2005
Abstract For sensory difference tests, one way, but not the only way, of dealing with the problem of overdispersion is to use a beta-binomial analysis. Commonly, binomial statistical analyses are used for these methods and they assume that the sensitivity of the judges is uniform. However, judge sensitivity varies and this adds a problematical extra variance to the distribution. This is termed overdispersion and renders simple binomial analysis prone to Type I error. The distribution of sensitivity of the judges is described by a beta-distribution. The analysis, combining beta and binomial distributions, gives an index, gamma. This ranges from zero, for no overdispersion, to unity, for total overdispersion. A compact beta-distribution clustered around the mean of the binomial distribution, would add little extra variance and elicit minimum distortion of the binomial distribution, yielding a zero or near zero gamma value. A more scattered or even bimodal beta-distribution would have a substantial effect and yield a significant gamma value. One question that has been posed is whether some test methods are more prone to overdispersion than others. Yet, a consideration of the reasons for overdispersion would suggest that significant gamma values were more a result of obtaining a heterogenous sample of sensitive and insensitive judges by chance. To confirm this, Ôless sensitiveÕ and Ômore sensitiveÕ samples of judges performed 2-AFC and 3-AFC tests with resulting zero gamma values, indicating no overdispersion. However, when the less and more sensitive groups were combined, significant gamma values were obtained, indicating the presence of overdispersion. However, in a further experiment using 2-AFC tests, when the Ôless sensitiveÕ group had its sensitivity increased by a Ôwarm-upÕ procedure, combination with the Ômore sensitiveÕ group did not result in overdispersion. 2005 Elsevier Ltd. All rights reserved. Keywords: Overdispersion; Beta-binomial; Gamma; Warm-up; Difference tests; 2-AFC; 3-AFC; Triangle; Duo-trio
1. Introduction Statistical tests based on the binomial distribution are generally used to determine whether the proportion of difference tests performed correctly is greater than chance, thus indicating that the difference was ÔsignificantÕ. Yet, such binomial statistics were designed to analyze the results of tossing coins or dice when the probability of getting a target result (ÔheadsÕ or ÔsixÕ) is constant for each item (1/2 or 1/6). With a set of difference tests performed by a single judge during a single experimental session, it could be argued that the probability of getting a target *
Corresponding author. Tel.: +1 530 756 5493; fax: +1 530 756 7320. E-mail address:
[email protected] (M. OÕMahony).
0950-3293/$ - see front matter 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.foodqual.2005.09.015
result (test correct) is also constant over repeated tests, should there be no fatigue or adaptation effects over replicate testings. Yet, if data from several judges were to be combined, this situation would no longer hold; judges have different sensitivities and thus their probabilities of getting the target result (test correct) would vary. The assumptions for the binomial test would be violated. This violation can result in Type I errors, the declaration of a ÔsignificantÕ difference when the results were probably due to chance or guessing. The variation in the sensitivity of the judges provides an extra source of variance in the computation. There is more variance than would be expected from a mere binomial analysis. The problem of this extra variance has a name: the problem of overdispersion.
O. Angulo et al. / Food Quality and Preference 18 (2007) 190–195
There are various solutions to the problem (Brockhoff, 2003; Brockhoff & Schlich, 1998; Kunert, 2001; Kunert & Meyners, 1999). However, the purpose of this paper is experimental; it is not to discuss the relative merits of the various statistical approaches. The approach that will be discussed here uses beta distributions to describe the distributions of judge sensitivities encountered during difference testing. The beta distributions are combined with the regular binomial distributions to give what are called beta-binomial distributions. These are the basis for the beta-binomial statistical analysis for difference tests. The beta-binomial with the extra variance brought in by the beta distribution will have a greater variance than the binomial distribution. This greater variance means it will be more difficult to reject the null hypothesis and declare a ÔsignificantÕ difference. In this way, it can be seen that there is a risk of Type I errors if a binomial analysis is used rather than a beta-binomial analysis. Harries and Smith (1982) studied the beta-binomial analysis for triangle tests with demonstrations of the effects of some beta distributions. More recently, the beta-binomial test has been developed by Ennis, Bi and their coworkers (Bi & Ennis, 1998, 1999a; Bi, Templeton-Janik, Ennis, & Ennis, 2000; Ennis & Bi, 1998). They describe overdispersion by an index called gamma (c). A gamma value of zero indicates no overdispersion while a gamma value of unity indicates maximum overdispersion. Generally, values are intermediate between zero and unity. Bi and Ennis (1999b) published tables indicating how the proportions of tests required to be correct to declare a ÔsignificantÕ difference varied with gamma. For experiments in which the data analysis is in terms of d 0 , the beta-binomial approach fits conveniently into a Thurstonian framework (Bi & Ennis, 1998). The required increase in variance for the beta-binomial distribution is obtained by using a multiplier, which is a function of gamma. The same multiplier can be used whether the beta-binomial represents the variation in the proportion of tests correct or the variation of d 0 . A further aspect of being able to correct for overdispersion is the ability to increase the sample size by combining judges and replicate testings to boost the power of the analysis. If a coin is tossed three times and three separate coins are tossed once, it is possible to combine coins and replicate tosses and have a sample size of six tosses. The same is possible for dice but not for judges performing difference tests. A judge performing three triangle tests is not equivalent to three separate judges each performing a single triangle test. Judges and replicate testings cannot be combined using a binomial statistical analysis unless the gamma value was zero. To do so, would open the possibility of Type I errors. Yet, with a beta-binomial analysis, judges and replicate testings can be combined. Thus, if 10 judges each perform 10 tests, the data can be treated as a sample size of 10 · 10 = 100. In practice, sometimes overdispersion (a c value significantly greater than zero) is encountered; sometimes it is
191
not. Rousseau and OÕMahony (2001) working with orange drinks found no overdispersion with triangle and same–different tests. Braun, Rogeaux, Schneid, OÕMahony, and Rousseau (2004) using 2-AFC tests, reported overdispersion with a gamma value of 0.036, which was significantly greater than zero. Yet, it could be argued that the value was unimportantly small and only significant because of the large sample size. Rousseau and OÕMahony (2000) using triangle, dual pair, and same–different tests, with orange flavored beverages, combined judges and replicate testings but did not quote a gamma value in their analysis. Ligget and Delwiche (2005) required judges to perform paired preference and 2-AFC tests for fruit flavored beverages, chips and cookies. Significant gamma values were obtained for two of the five paired preference studies and two of the five 2-AFC studies. Yet, these did not occur for the same foods; there was no consistency over products. Increasing the number of replicate testings also had little effect. Comparing judgesÕ performance on 2-AFC, 3AFC, triangle and duo-trio tests with cherry flavored drinks, a significant gamma value was obtained only with the duo-trio. Using cherry flavored drinks, once more, and 2-AFC tests, the performance of judges over 10 days was measured. Significant overdispersion occurred on three of the days. Considering these studies, there does not appear to be any particular pattern that supports that one test may be more prone to overdispersion than another. Yet, the evidence, as yet, is still sparse. The first goal of this study was to collect some further evidence. To gain further insight, it is worth reconsidering how values of gamma should be interpreted. Given that judges are not clones, then, it may be asked why cases occur when gamma values are not significant and there is no significant overdispersion. It would only be possible if the addition of a beta distribution to a binomial distribution had a minimal effect on the shape and variance of the latter. As Harries and Smith (1982) pointed out, a beta distribution that was fairly compact and clustered around the mean of a binomial distribution would have such a minimal effect. It would mean that the sensitivities of the judges in the discrimination tests were close to the mean. As they further remarked, a beta distribution that was more scattered or even bimodal would have a substantial effect on the binomial distribution and thus produce a significant gamma value. Bimodal distributions can occur in preference testing when the judges are split on their preferences and also with difference tests, if the sample contains a group of more sensitive and a group of less sensitive judges. The Ômore sensitiveÕ and Ôless sensitiveÕ judges under consideration may come from a bimodal distribution, indicating differences in the judgesÕ sensory systems per se. Alternatively, they might be drawn from a single distribution, and the terms Ômore sensitiveÕ and Ôless sensitiveÕ merely applied to variation in judges performance in that particular test. To support these considerations, firstly, it may be hypothesized that if a sample of judges performing a
192
O. Angulo et al. / Food Quality and Preference 18 (2007) 190–195
difference test were to be made up of such Ômore sensitiveÕ and Ôless sensitiveÕ groups, then gamma values for each of the two separate groups would be low. Yet, when the two groups were combined, the beta distribution would cause significant distortion of the binomial distribution and produce a higher gamma value. The second goal of this study was to test this hypothesis. A second hypothesis concerns what Pfaffmann (1954) called Ôwarm-upÕ and which increases the measured sensitivity of judges. The term was borrowed from early studies on paired associate learning (for example, Heron, 1928; Thune, 1950) and skilled behavior acquisition (for example, Ammons, 1947). For gustation, the phenomenon is more akin to the latter, taste discrimination having many of the attributes of skilled behavior. In current practice (Dacremont, Sauvageot, & Duyen, 2000; OÕMahony, Thieme, & Goldstein, 1988; Thieme & OÕMahony, 1990) warm-up involves judges alternately testing the two stimuli to be discriminated, knowing which is which, until they have identified the difference in sensations elicited by the two confusable stimuli. Often, with confusable taste stimuli, a judge will not, at first, perceive a difference. Yet, after repeated testing, the differences between the stimuli begin to appear, sometimes suddenly. The increase in sensitivity induced by warm-up can be considered as a focusing of attention: focusing on or amplifying the differences in input from the two stimuli, while attenuating the similarities. Consider the situation where there was a sensitive and an insensitive group combined to give a high gamma value. Then, if the less sensitive group performed the warm-up procedure, it could be hypothesized that their increase in sensitivity would render them as sensitive as the more sensitive judges. In this way, the two groups of judges would become more similar in sensitivity; gamma would decrease. The third goal of this study was to test this hypothesis. For all three goals, a model system was used to confine the variance to judge variation rather than product variation. 2. Experiment I The goal of this experiment was to collect further data comparing the proneness of various difference test methods to overdispersion. The test methods studied were: the triangle, duo-trio, 2-AFC and 3-AFC. 2.1. Materials and methods 2.1.1. Judges One hundred and six judges, students, staff and friends at the University of California, Davis, participated in the experiment. None of the judges had consumed food and/ or beverages within an hour before starting the experiment. Of the 106 judges tested, 13 had participated in taste sensory experiments before. The groups tested were as follows: 2-AFC method (12M, 9F, age range 22–58 yrs), 3-AFC (7M, 8F, 22–58 yrs), triangle (5M, 9F, 21–57 yrs), duo-trio
(6M, 10F, 21–58 yrs). A further group of 40 judges (21M, 19F, 21–75 yrs) used all four methods. 2.1.2. Stimuli The stimuli to be discriminated were 3 mM vs. 5 mM NaCl solutions for the group of 40 judges who performed all four tests. For the judges who performed just one test, 3 mM vs. 5 mM NaCl solutions were used for the 2-AFC and 3-AFC tests and 1 mM vs. 5 mM NaCl solutions for the less powerful (Ennis, 1993) triangle and duo-trio tests. The NaCl was reagent grade (Mallinckrot, Inc. Paris, KY) and the solvent was Milli-Q purified water (Millipore Corp, Bedford, MA). The Milli-Q purified water had a specific conductivity < 106 mho/cm and a surface tension P 71 dynes/cm. The purified water was also used for interstimulus rinsing. Stimuli were dispensed in 10 ml aliquots using Oxford Adjustable Dispensers (Lancer, St. Louis, MO.) in plastic cups (1oz. portion cups, Solo Cup Co., Urbana, IL). All stimuli were served at constant room temperature (21– 24 C) on white plastic cutting trays. 2.1.3. Procedure A related-samples and an independent-samples design was used. For the related-samples design, judges were required to perform all four test methods in a single session. After taking demographic details, establishment of rapport and experimental instructions, judges took seven mouthrinses. They then proceeded to perform six 2-AFC, six 3-AFC, six triangle and six duo-trio tests. No interstimulus rinses were taken within tests; judges were able to rinse ad lib between tests. The order of the four test methods and the order of stimulus presentation within a method was counterbalanced over judges. Judges gave their responses verbally. Session lengths ranged between 12–30 min. For the independent-samples design, separate samples of judges (noted above) used only one test method. The procedure was as above, except that each judge performed 12 tests in a session. Some judges performed in more than one group. Session lengths ranged 5–15 min. 2.2. Results For each test method, for both the related-samples and independent-samples designs, gamma values were derived according to the beta-binomial computation (Bi & Ennis, 1998, 1999a, 1999b; Bi et al., 2000). The computations were performed using IFPrograms software based on maximum likelihood (Institute for Perception, Richmond, VA). Table 1 displays the gamma values along with probabilities of getting values this large on the null hypothesis (c = 0). From the table, it can be seen that a significant gamma value was obtained for the 2-AFC method (p = 0.01) with the related-samples design. For the independent-samples design, a near significant gamma value (p = 0.08) was obtained for the triangle method. There was no consistent trend for one particular method to be more prone to over-
O. Angulo et al. / Food Quality and Preference 18 (2007) 190–195 Table 1 Gamma values for the 2-AFC, 3-AFC, duo-trio and triangle methods for the related-samples and independent-samples designs for Experiment I Experimental design
2-AFC
Related-samples designa
N = 40
c p Independent-samples designb c p a b
3-AFC
duo-trio
Triangle
0.12 0.01
0.04 0.35
0.00 1.00
0.00 1.00
N = 21
N = 15
N = 16
N = 14
0.03 0.34
0.01 0.75
0.05 0.17
0.07 0.08
Each judge performed 6 tests for each method. Each judge performed 12 tests for each method.
dispersion than another. This accorded with the previous findings discussed above. Values of d 0 (Ennis, 1993) are not quoted in the table, but for the related-samples design they were not significantly different, ranging from 0.8 to 1.2. For the independent-samples design, d 0 values ranged from 1.1 to 2.5 for separate sets of judges. Computations of d 0 were performed using the IFPrograms software. 3. Experiment II This experiment was designed to test the first hypothesis that two groups of judges of different sensitivity, each displaying no overdispersion would, when combined, give significant gamma values. For this, judges performed 2-AFC and 3-AFC tests. On the basis of their d 0 values, they were divided into two separate groups: Ômore sensitiveÕ and Ôless sensitiveÕ. Gamma values were computed for the two groups separately and also for when the two were combined. 3.1. Materials and methods 3.1.1. Judges Twenty judges (3M, 17F, age range 16–63 yrs) performmed 2-AFC tests and 16 judges (6M, 10F, 20–62 yrs) performed 3-AFC tests. These judges had been screened as being sufficiently sensitive to be included in the experiment. As before, the samples of judges were drawn from students, staff and friends at the University of California, Davis. All judges had fasted, except for water, for at least 1 h prior to the experiment. All judges were naı¨ve to sensory testing except for four judges in the 2-AFC group and eight in the 3-AFC group.
3.2. Results For both, the 2-AFC and 3-AFC tests, judges were split into Ômore sensitiveÕ and Ôless sensitiveÕ groups. For this, a simple rule was adopted, despite different chance probabilities. Judges who performed 16–20 tests correctly were deemed Ômore sensitiveÕ, while those performing 10–15 tests correctly were categorized as Ôless sensitiveÕ. Judges with inferior or chance performance levels were considered not to be sufficiently sensitive to the differences between the stimuli to be included in the Ômore sensitiveÕ or Ôless sensitiveÕ groups. They were eliminated during the screening and their data are not recorded here. For the 2-AFC method, gamma and d 0 values, with associated null probabilities and variances, were computed as in Experiment I, using IFPrograms software. Separate values were computed for the Ômore sensitiveÕ and Ôless sensitiveÕ groups as well as for the two groups combined. The same computation was performed for the 3-AFC-method. The data are shown in Table 2. From the table, it can be seen that the Ômore sensitiveÕ groups have higher d 0 values than the Ôless sensitiveÕ groups, while the combined values are intermediate, as expected. It can also be seen that within each ÔsensitivityÕ group, gamma values are zero, indicating no overdispersion. Yet, the gamma values of the combined groups are finite and significantly greater than zero (p 6 0.03), indicating significant overdispersion. Thus, two groups of different sensitivity, each having no overdispersion, when combined can produce finite gamma values. This confirms the first hypothesis. It might be argued that the significance results obtained for the combined samples owe their significance to the larger sample sizes. This might indeed be true. However, the fact that the Ômore sensitiveÕ and Ôless sensitiveÕ samples had zero gamma values and the combined samples had finite values, indicates how overdispersion increased when the two samples were combined. To test consistency, the experiment was repeated on the same judges and gave the same results. For the 2-AFC, gamma values were 0.03 (more sensitive), 0.00 (less sensitive) and 0.09 (combined). For the 3-AFC, the corresponding values were 0.00, 0.00 and 0.06.
Table 2 Gamma values for the 2-AFC and the 3-AFC methods with more sensitive and less sensitive judges for Experiment II
3.1.2. Stimuli The stimuli to be discriminated were 3 mM vs. 5 mM NaCl solutions as described for Experiment I. 3.1.3. Procedure The procedure, including the rinsing protocols, was as for Experiment I, with the following modifications. In a single session, judges performed twenty 2-AFC (or 3AFC) tests. Session lengths ranged 10–20 min.
193
2-AFC More sensitive
c p d0 r2
3-AFC Less sensitive
Combined
More sensitive
Less sensitive
Combined
N=6
N = 14
N = 20
N=8
N=8
N = 16
0.00 1.00 1.75 0.05
0.00 1.00 0.52 0.01
0.04 0.03 0.81 0.01
0.00 1.00 2.10 0.03
0.00 1.00 0.99 0.02
0.06 0.01 1.46 0.01
For both methods, each judge performed 20 tests.
194
O. Angulo et al. / Food Quality and Preference 18 (2007) 190–195
4. Experiment III This experiment was designed to confirm the second hypothesis that given a Ômore sensitiveÕ and a Ôless sensitiveÕ group of judges, warm-up given to the Ôless sensitiveÕ group, would increase its sensitivity to such an extent that the combined group would not display overdispersion. 4.1. Materials and methods 4.1.1. Judges Nineteen judges (3M, 16F, age range 16–63 yrs) students, staff and friends at the University of California, Davis participated. All had fasted for at least 1 h before the experiment took place. All, except seven, were naı¨ve to sensory testing. 4.1.2. Stimuli The stimuli to be discriminated were 3 mM vs. 5 mM NaCl solutions, as in Experiments I and II. 4.1.3. Procedure The procedure, including rinsing protocols, was the same as for Experiment II, judges performing twenty 2AFC tests in a single session. Depending on performance in the session, judges were divided, as before, into a Ômore sensitiveÕ and a Ôless sensitiveÕ group, using the same performance criteria as in Experiment II. Those judges in the Ôless sensitiveÕ group repeated the experiment with the addition of a prior warm-up procedure. For this group, after the initial seven mouthrinses, judges went through the warm-up before performing the twenty 2-AFC tests. For the warm-up, judges were presented with a set of 3 mM and 5 mM NaCl solutions. Each set was labeled so that the judge was aware of the stimuli. They tasted the 3 mM and 5 mM stimuli alternately, until they felt that they could distinguish the signals indicating the difference between the two stimuli. Initially, six of each stimulus was presented and this was found sufficient for warm-up. When judges reported that they could distinguish between the two stimuli, testing was started immediately. Some judges reported that they did not think that they had Ôwarmed-upÕ, but they proceeded with the experiment, anyway. No interstimulus rinses were taken during warm-up. Session lengths ranged 10–20 min. 4.2. Results Gamma and d 0 values with associated null probabilities and variances, were computed as in Experiment II for the Ôless sensitiveÕ, Ômore sensitiveÕ and combined groups, and in the formerÕs case, before and after warm-up. These data are displayed in Table 3. From the table, it can be seen that the Ômore sensitiveÕ group had higher d 0 values than the Ôless sensitiveÕ group, with the combined groups having intermediate values, as expected. It can be seen, that the Ômore sensitiveÕ and Ôless
Table 3 Gamma values for the 2-AFC method with more sensitive and less sensitive judges before and after warm-up for Experiment III 2-AFC Before warm-up
c p d0 r2
After warm-up
More sensitive
Less sensitive
Combined
Less sensitive
Combined
N = 10
N=9
N = 19
N=9
N = 19
0.00 1.00 1.70 0.03
0.00 1.00 0.52 0.02
0.07 0.00 1.05 0.01
0.00 1.00 0.98 0.02
0.01 0.50 1.31 0.01
Each judge performed 20 tests in each condition.
sensitiveÕ groups, whether warmed-up or not, were sufficiently homogeneous to have zero gamma values. However, when the two groups were combined, without any warm-up, there was sufficient overdispersion to give a finite and significant gamma value. Yet, if the Ôless sensitiveÕ judges were given a prior warm-up, their measured sensitivity increased enough (d 0 = 0.52 vs. 0.98) for the judges to be sufficiently homogeneous with the Ômore sensitiveÕ group. This resulted in the combined group having a zero gamma value, indicating no overdispersion and confirming the second hypothesis. 5. Discussion In the first experiment, there was no particular trend for one difference test to be more prone to overdispersion than another. This corresponded to previous research discussed above. It would appear that overdispersion was less the result of the test method than the result of a particular sampling of judges. From a consideration of the suggestion of Harries and Smith (1982), it would appear that overdispersion is the result of a particular distribution of judge sensitivities, the beta distribution. A compact beta distribution would have little effect on the binomial distribution, so that the beta-binomial would resemble the binomial, resulting in little or no overdispersion and non-significant gamma values. This was confirmed by Experiments II and III, where Ôless sensitiveÕ groups of judges were combined with Ômore sensitiveÕ groups, both having zero gamma, to form a combined group. The latter then, demonstrated overdispersion, with gamma values significantly greater than zero. The effect was prevented by providing the Ôless sensitiveÕ group with a warm-up procedure, to match its sensitivity to that of the Ômore sensitiveÕ group. Overdispersion is thus avoided if the sample of judges happens to have sensitivities distributed close to the mean of the binomial distribution. One may speculate regarding when such a group of judges might be sampled. It might be expected with regular long-term consumers who had become uniformly highly sensitive to product differences over the years. With casual consumers, variation in frequency of use might have more of an effect on sensitivity and therefore generate overdispersion. This is a testable hypothesis.
O. Angulo et al. / Food Quality and Preference 18 (2007) 190–195
The beta-binomial allows combination of consumers and replicate tests, to increase the sample size. This might be acceptable in psychophysics, where judges can be fairly homogeneous in some aspects of performance. Combining judges and replicate tests to obtain a large sample size might be necessary for such tasks as fitting an average ROC curve to the performance of a small group of judges (Hautus & Irwin, 1995). It might also clarify the results of some difference tests. However, the technique should be used with caution when sampling consumers. If 10 consumers each performed five difference tests, even though the sample size can be treated as 50, in reality only 10 consumers will have been tested. This is hardly a representative sample. It should also be noted that gamma values can only be calculated to reveal possible overdispersion effects, when judges perform replicate tests. If each judge were only to perform a single test, it would not be possible to determine whether there was overdispersion or not. Thus, it is recommended that judges should repeat the test at least once, to enable the presence of overdispersion to be detected. This is important to avoid the possibility of Type I errors. Overdispersion can be likened to interaction in ANOVA; its presence can only be detected with replication. The beta-binomial deals with sensitivity variation due to judge heterogeneity. It assumes that sensitivity is constant for a given judge over replicate tests. This may not be true for all products testing situations. If sensitivity varied over replicate tests, the statistical analysis would require a more complex beta–beta-binomial analysis. Thus, even though replication is required to detect the presence of overdispersion, the number of replicates chosen should be approached with caution. Preliminary experimentation is necessary for determining the appropriate number of tests per session. Too many replicates in a given session could induce Ôtaste fatigueÕ. The resulting change in sensitivity of the judge would entail more complicated statistics. Experiment I included the triangle and duo-trio methods, Experiments II and III confined themselves to 2AFC and 3-AFC methods. This was because the larger number of replicate testings and possible warm-up effects might have altered the cognitive strategies of the triangle and duo-trio from a Ôcomparison of distancesÕ strategy to a ÔskimmingÕ strategy. While on the subject of cognitive strategies, the present research does not necessarily refute the idea that cognitive strategies can affect overdispersion. However, the lack of consistency among tests, would suggest that any cognitive strategy effect must be small. References Ammons, R. B. (1947). Acquisition of motor skill: I. Quantitative analysis and theoretical formulation. Psychological Review, 54, 263–281.
195
Bi, J., & Ennis, D. M. (1998). A Thurstonian variant of the beta-binomial model for replicated difference tests. Journal of Sensory Studies, 13, 461–466. Bi, J., & Ennis, D. M. (1999a). The power of sensory discrimination methods used in replicated difference and preference tests. Journal of Sensory Studies, 14, 289–302. Bi, J., & Ennis, D. M. (1999b). Beta-binomial tables for replicated difference and preference tests. Journal of Sensory Studies, 14, 347–368. Bi, J., Templeton-Janik, L., Ennis, J. M., & Ennis, D. M. (2000). Replicated difference and preference tests: how to account for intertrial variation. Food Quality and Preference, 11, 269–273. Braun, V., Rogeaux, M., Schneid, N., OÕMahony, M., & Rousseau, B. (2004). Corroborating the 2-AFC and 2-AC Thurstonian models using both a model system and sparkling water. Food Quality and Preference, 15, 501–507. Brockhoff, P. B. (2003). The statistical power of replications in difference tests. Food Quality and Preference, 14, 405–417. Brockhoff, P. B., & Schlich, P. (1998). Handling replications in discrimination tests. Food Quality and Preference, 9, 303–312. Dacremont, C., Sauvageot, F., & Duyen, T. H. (2000). Effect of assessors expertise on efficiency of warm-up for triangle tests. Journal Sensory Studies, 15, 151–152. Ennis, D. M. (1993). The power of sensory discrimination methods. Journal of Sensory Studies, 8, 353–370. Ennis, D. M., & Bi, J. (1998). The beta-binomial model: accounting for inter-trial variation in replicated difference and preference tests. Journal of Sensory Studies, 13, 389–412. Harries, J. M., & Smith, G. L. (1982). The two-factor triangle test. Journal of Food Technology, 17, 153–162. Hautus, M. J., & Irwin, R. J. (1995). Two models for estimating the discriminability of foods and beverages. Journal of Sensory Studies, 10, 203–215. Heron, W. T. (1928). The warming-up effect in learning nonsense syllables. Journal of Genetic Psychology, 35, 219–228. Kunert, J. (2001). On repeated difference testing. Food Quality and Preference, 12, 385–391. Kunert, J., & Meyners, M. (1999). On the triangle test with replications. Food Quality and Preference, 10, 477–482. Ligget, R. E., & Delwiche, J. (2005). The beta-binomial model: Variability in overdispersion across methods and over time. Journal of Sensory Studies, 20, 48–61. OÕMahony, M., Thieme, U., & Goldstein, L. R. (1988). The warm-up effect as a measure of increasing the discriminability of sensory difference tests. Journal of Food Science, 53, 1848–1850. Pfaffmann, C. (1954). Variables affecting difference tests. In D. R. Peryam, F. J. Pilgrim, & M. S. Peterson (Eds.), Food acceptance testing methodology, a symposium (pp. 4–20). Washington, DC: National Academy of Science-National Research Council. Rousseau, B., & OÕMahony, M. (2000). Investigation of the effect of within-trail retasting and comparison of the dual-pair, same–different and triangle paradigms. Food Quality and Preference, 11, 457–464. Rousseau, B., & OÕMahony, M. (2001). Investigation of the dual pair method as a possible alternative to the triangle and same–different tests. Journal of Sensory Studies, 16, 161–178. Thieme, U., & OÕMahony, M. (1990). Modifications to sensory difference test protocols: the warmed up paired comparison, the single standard duo-trio and the A-not A test modified for response bias. Journal of Sensory Studies, 5, 159–176. Thune, L. E. (1950). The effect of different types of preliminary activities on subsequent learning of paired associate material. Journal of Experimental Psychology, 40, 423–438.