Food Quality and Preference 18 (2007) 342–352 www.elsevier.com/locate/foodqual
Labeled magnitude scales for oral sensations of wetness, dryness, pleasantness and unpleasantness Steve Guest
a,*
, Greg Essick b, Akshya Patel c, Rajan Prajapati c, Francis McGlone
d,1
a
b
Center for Neurosensory Disorders, School of Dentistry, 2160 Old Dental Bldg., University of North Carolina, Chapel Hill, NC 27599-7450, United States Department of Prosthodontics, Curriculum in Neurobiology, Center for Neurosensory Disorders, University of North Carolina, Chapel Hill, NC, United States c School of Dentistry, University of North Carolina, Chapel Hill, NC, United States d Cognitive Neuroscience, CSIG, Unilever Research and Development, Wirral, United Kingdom Received 23 August 2005; received in revised form 10 March 2006; accepted 14 March 2006 Available online 25 April 2006
Abstract Methods for directly comparing the salience of oral sensations between individuals are not available. To this end, we generated labeled magnitude scales for assessing the magnitude of oral sensations of wetness, dryness, pleasantness and unpleasantness. Seventy-three subjects provided magnitude estimates for seven intensity descriptors, randomly interspersed with examples of various nonpainful oral sensations, which were rated also. Twenty of the subjects provided ratings for all four scales twice during four days of testing. Analysis of these subjects’ data indicated that the ratings of the intensity descriptors significantly varied (F6,1045 = 688.00, p < 0.001), but were similar for all four scales (F18,1045 = 0.64, p = 0.87). Fifty-three of the 73 subjects provided data without replications, or data for only two of the four scales. The complete dataset was divided into separate analyses for wetness/dryness scales (OWDS; n = 51 subjects) and for pleasantness/unpleasantness scales (OPUS; n = 49). Results did not differ from those of the 20 subjects described above. Additionally, no effects of gender or sensitivity to 6-n-propyl-2-thiouractil (PROP) upon the ratings were seen. The mean ratings of the 51 and 49 subjects were used to define label positions on the OWDS and OPUS, respectively. Comparison of the two scales with the LMS [Green, B. G., Shaffer, G., & Gilmore, M. M. (1993). Derivation and evaluation of a sematic scale of oral sensation magnitude with apparent ratio properties. Chemical Senses 18, 683–702] and the LAM scale [Schutz, H. G., & Cardello, A. V. (2001). A labeled affective magnitude (LAM) scale for assessing food liking/disliking. Journal of Sensory Studies 16, 117–159] indicates that positions of labels for OWDS and OPUS are similar to those for the LAM. The OWDS and OPUS labels are shifted toward the upper end of the scale, considerably so when compared with the LMS. This suggests that using the LMS to rate oral wetness, dryness, pleasantness or unpleasantness would underestimate the intensities of relatively weak sensations. 2006 Elsevier Ltd. All rights reserved. Keywords: Sensory scaling; Magnitude estimation; Labeled magnitude scale; Oral sensation; Hedonics
1. Introduction A topic of pervasive interest in psychophysics is how best to obtain and compare judgments of stimulus intensity *
Corresponding author. Tel.: +1 919 966 2953; fax: +1 919 966 3683. E-mail address:
[email protected] (S. Guest). 1 Department of Neurological Sciences, School of Medicine, Liverpool University, United Kingdom. 0950-3293/$ - see front matter 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.foodqual.2006.03.012
from different individuals. This interest has extended to judgments of the intensity of very specific sensations, such as the perceived pleasantness of food (Schutz & Cardello, 2001) or physical exertion during exercise (Borg, 1982). For many years, visual-analogue scales (VAS) and labeled category scales (composed of fixed-interval, ranked categories) were preferred, in large part due to their simplicity. However, the validity of these scales was subsequently challenged by S.S. Stevens’ investigations of magnitude
S. Guest et al. / Food Quality and Preference 18 (2007) 342–352
estimation (e.g., Stevens & Poulton, 1956). With magnitude estimation (ME) ratings of stimulus intensity are, at least in principle, directly comparable between- and within-observers in terms of their ratios. However, ME has significant disadvantages as compared with VAS and labeled category scales. First, the ME procedure may be more difficult for naı¨ve respondents to comprehend than that of scaling techniques, as reflected in the idiosyncratic numbers used by some subjects (Teghtsoonian, Teghtsoonian, & Baird, 1995). Further, because the numbers produced by subjects during ME have no inherent meaning, it is not possible to directly compare the magnitude estimates produced by different subjects. For example, different subjects commonly use different numbers for the same perceived intensities. Additionally, in some circumstances ME is less sensitive than scaling methods, such as use of the CR100 scale, in discriminating differences in stimuli based on perceived intensity (Borg & Borg, 2002). In more recent years, a type of scale has been developed that aims to provide the advantages of ME, VAS and labeled category scales, namely the labeled magnitude (aka category-ratio) scale. In general, labeled magnitude scales consist of a VAS-like scale upon which intensity descriptor labels, similar to those of a labeled category scale, have been added at locations experimentally-determined using ME. All labeled magnitude scales include an upper anchor at or beyond the most intense end of the scale, to provide the context in which all responses are made (e.g., Borg, 1982; Borg, 1998; Borg & Borg, 2002). Although researchers in the 1970s experimentally defined the relative intensities for a range of intensity descriptors (e.g., for pain, Gracely, McGrath, & Dubner, 1978a) that could, in principle, be used to produce labeled magnitude scales, the earliest such scale entering common use was that reported by Borg (1982). The Borg CR-10 scale was developed primarily for the rating of perceived exertion (RPE), and a version of this scale is in wide use at the time of writing (e.g., Noakes, 2002), including its use for rating the intensity of sensations for which it was not designed (e.g., Johansson, Kjellberg, Kilbom, & Ha¨gg, 1999). Even for such non-intended applications the scale has been shown to perform well when compared with other psychophysical methods (Marks, Borg, & Westerlund, 1992), although it does not always produce the same ratios of perceived intensity between stimuli as ME (Marks, Borg, & Ljunggren, 1983). This indicates that using the CR-10 for nonintended applications can lead to intensity ratings that are not ratio scaled. In recent years, the most widely used labeled magnitude scale, and indeed the first such scale to be explicitly named a ‘labeled magnitude scale’ (LMS) is that developed by Green, Shaffer, and Gilmore (1993). The LMS was originally developed for rating the intensity of general oral stimuli, with the top anchor of the scale representing the most intense oral sensation imaginable, including painful sensations. This scale has been shown to be more sensitive than other methods, such as labeled category scaling, in discri-
343
minating between different subgroups of subjects, such as in the classification of individuals based on their perception of bitter tastes (Bartoshuk, 2000). The LMS, like the Borg scale, has been applied outside of its original intended use, such as in the rating of tactile roughness (Diamond & Lawless, 2001). This might not be optimal for several reasons. First, the original LMS only produces data equivalent to magnitude estimation for scales that include painful sensations within the most intense sensation label (Green et al., 1996). For example, the LMS does not produce the same ratio of sensory intensities as magnitude estimation when used for rating the intensity of sweetness. Second, the intensity descriptor labels themselves may not be appropriate for certain oralor indeed non-oral sensations. This is essentially an issue of face validity and usability. For example, one is unlikely to refer to pleasant oral sensations as ‘strong’; the adjective strong might be better phrased as ‘very’, or some other adjective of similar intensity appropriate for hedonic applications. Third, and perhaps most importantly, the relative positions of the intensity descriptor labels may be different when comparing the LMS with scales that are produced to assess specific sensations such as oral pleasantness, wetness, roughness and so forth. In recognition of these concerns, Schutz, Cardello and colleagues developed separate (sets of) labeled magnitude scales for rating the pleasantness and unpleasantness of foods (the LAM scale, Schutz & Cardello, 2001), for rating satiety and hunger (the SLIM scale, Cardello, Schutz, Lesher, & Merrill, 2005) and for rating the comfort of clothing (the CALM scale, Cardello, Winterhalter, & Schutz, 2003). The development of each of these scales included a wide range of appropriate labels to describe sensation intensity. The scales generated show a small amount of asymmetry in descriptor label locations for the positive versus negative hedonic or intensive labels. Similar to the LMS, the LAM was shown by the authors to be relatively sensitive, in particular more sensitive than the Natick 9point hedonic scale (Peryam & Pilgrim, 1957), a labeled category scale frequently used to obtain affective ratings of foods. Similarly, the CALM scale has been shown to be more sensitive than a VAS satiety scale (Cardello et al., 2003). To our knowledge, the LAM scale is the only labeled magnitude scale that has been generated for specific experiences obtained in the context of non-noxious oral stimulation. Following on from the work of Schutz and Cardello, we developed scales for the oral sensations of wetness and dryness, and for the pleasantness and unpleasantness of liquids in the mouth. We anticipated that the locations of semantic intensity descriptors would differ from those of the LMS, given that the four sensations we studied did not include pain within their sensory range. Additionally, we expected our oral pleasantness and unpleasantness scales to agree more closely with the LAM of Schutz and Cardello than the LMS, given the conceptual similarities between our consideration of general oral pleasantness/unpleasantness
344
S. Guest et al. / Food Quality and Preference 18 (2007) 342–352
and Schutz and Cardello’s investigations of the pleasantness/unpleasantness of foods. In addition to the development of the scales, we investigated whether the meaning of the semantic intensity descriptors from which the scales were constructed was influenced by subjects’ ability to taste 6-n-propyl-2-thiouractil (PROP). Increased sensitivity to this substance has been shown to be associated with increased lingual tactile acuity (Essick, Chopra, Guest, & McGlone, 2003), increased preference for fats over carbohydrates (Kamphuis & Westerterp-Plantenga, 2003), and gender-dependent preferences for fats and sweeteners (Duffy & Bartoshuk, 2000). In essence, subjects with high sensitivity to PROP have rather different oral-perceptual worlds than non-tasters of PROP. It is thus possible that semantic descriptors of oral intensity might vary in meaning depending on an individual’s ability to taste PROP. If this were indeed found to be the case, it would cast doubt over the use of oral-intensity scales in populations with individuals that differ markedly in PROP sensitivity. 2. Method 2.1. Subjects Seventy three healthy individuals (23 males, 50 females, age range 18–57 years) agreed to take part in the experiment. All were rewarded for their participation at a rate of $10/h. The study was approved on ethical and safety grounds by the School of Dentistry institutional review board (IRB). 2.2. Procedure The method described in Green et al. (1993) was adhered to closely. The scale generation procedure involved the subject making modulus-free magnitude estimates for a series of intensity descriptors (semantic labels). The descriptors chosen were ‘barely detectably’, ‘weakly’, ‘slightly’, ‘moderately’, ‘very’, ‘extremely’ and either ‘-est imaginable’ for wet and dry labels or ‘most . . . imaginable’ for pleasant and unpleasant labels. These differ slightly from those of Green et al. (1993); the descriptors we chose were designed to better fit scale labels for the intensity of oral wetness, dryness, pleasantness, and unpleasantness. The descriptors were displayed on a computer (PC) screen embedded within the sentence ‘A __________ (wet/dry/ pleasant/unpleasant) oral sensation’, or either ‘the most (pleasant/unpleasant) oral sensation imaginable’ or ‘the (wettest/driest) oral sensation imaginable’, as appropriate. Each of the oral sensation types (i.e., wetness, dryness, pleasantness or unpleasantness) were considered separately during different experimental runs. A variety of contextual descriptors were also presented to subjects, consisting of familiar oral sensations of a wide range of intensities. Examples of contextual descriptors for relatively intense sensations included ‘the sourness of a slice of lemon’ and
‘the unpleasantness of a mouthful of sea water’. ‘The bitterness of celery’ and ‘the sourness of sour cream’ are examples for two relatively weak sensations which were presented to subjects. There were two different sets of contextual descriptors, each of which consisted of 27 descriptions of different oral sensations. These two sets were matched in terms of the types and intensities of the sensations described. None of the contextual descriptors represented painful sensations or events. The scale development procedure consisted of the following: First, subjects rated each of the contextual descriptors once. These were presented in random order. The subject recorded each response by typing a number representing their rating on the computer keyboard. It was permissible to enter any number greater than zero, including non-integers. After all contextual descriptors had been rated once, these descriptors were again presented in random order, but this time after every third rating, one of the intensity descriptors was shown and rated. Upon completion of this procedure, the subject was shown the ratings they had made to the intensity descriptors. The subject was then allowed to adjust the ratings that they felt were not optimal (Green et al., 1993). On analysis, it was found that only 17% of the ratings were so adjusted, i.e., on average each subject made an adjustment to just over one of the seven intensity descriptors in each session. Prior to data collection, subjects were familiarized with the magnitude estimation procedure via standardised instructions, followed by practice trials during which the subject rated a range of foodstuffs that they tasted. A variety of intensity judgments were made, examples of which included tasting tonic water and rating its bitterness, and tasting a lemon and rating its sourness. Only subjects who responded in a reasonable manner during this exercise were allowed to continue to the computer-assisted, data collection part of the experiment. For example, subjects who used a very restricted number range (e.g., only integers in the range 1–10) or who used very few numerically different ratings (e.g., just the numbers 10, 50 and 100) were excluded. Each subject was asked to provide responses for each of oral wetness, dryness, pleasantness and unpleasantness, twice. A single set of responses was made to each of two of the randomly-chosen oral sensations on a given day. For the two sessions on a given day, different sets of contextual descriptors were used in order to avoid subjects remembering the responses they made during the immediately preceding trials, and simply reproducing them. Forty of the subjects were screened for their ability to taste the bitter substance 6-n-propyl-2-thiouractil (PROP), which was provided for each subject on a 3 cm disc of filter paper. The subject moistened the paper in their mouth for 20 s, then removed the paper and swallowed any residual saliva. Intensity of bitterness was then rated using the LMS (Green et al., 1993), where the topmost anchor of the scale represented the most intense sensation imaginable in any sensory modality (Bartoshuk et al., 2002). For analysis
S. Guest et al. / Food Quality and Preference 18 (2007) 342–352
purposes, the intensity ratings were expressed as a proportion of the scale length. 2.3. Results Of the 73 subjects, 20 provided ratings of the intensity descriptors for all four of the oral sensations, on two separate occasions of testing (Table 1). The remaining 53 subjects provided partial sets of ratings. As may be seen in Table 1, if split into separate groupings for the wet/dry and pleasant/unpleasant oral sensations, data were available for 51 and 49 subjects, respectively. Subjects varied somewhat in the magnitude estimates they assigned to the individual intensity descriptors, as was expected. Fig. 1 illustrates the (log) magnitude estimates provided by two representative subjects. The ratings given to the intensity descriptors for all four oral sensations, on two separate occasions of testing, are shown. The reversals in the ordering of ratings for the ‘weak’ versus ‘slight’ descriptors seen in Fig. 1 occurred relatively frequently, suggesting that these descriptors were not well dissociated in terms of their intensity by subjects. With all of the available data included (i.e., n = 51 for wet/dry sensations, n = 49 for pleasant/unpleasant sensations), the mean (log) magnitude estimates for the intensity descriptors for all four oral sensations are as shown in Fig. 2. Prior to analysis of all of the available data, the data obtained from the 20 subjects who provided ratings for all four oral sensations on each of two separate occasions was analyzed. The magnitude estimates made to the intensity descriptors were modulus-equalized prior to analysis (Stevens, 1971). The modulus equalization procedure consisted of the following steps: (1) The logarithm was taken of each rating made by a subject to the descriptor, (2) The mean of each subject’s log-ratings was obtained, (3) The grand mean of all of the subjects’ mean log-ratings was found. This is termed the mean modulus, (4) a correction factor was then calculated for each subject as the mean modulus minus the mean of that subject’s log-ratings, (5) this offset was added to each log rating made by the subject. In the analyses reported subsequently, these modulus equalized values were used, unless otherwise stated. The subjects’ consistency in making repeated magnitude estimates to the seven different intensity descriptors was assessed by comparing the rating provided by a subject Table 1 Distribution of subject frequencies and genders according to the oral sensations rated Number of subjects All subjects All sensations (wet, dry, pleasant, unpleasant), twice All sensations, at least once Both wet and dry sensations, at least once Both pleasant and unpleasant sensations, at least once
73 (23 M, 50 F) 20 (6 M, 14 F) 27 (10 M, 17 F) 51 (16 M, 35 F) 49 (17 M, 32 F)
345
for a descriptor on the first testing occasion with the rating provided on the second occasion. Overall, the second rating was with ±0.3 log units of the first for most (80%) of the pairs of observations. A mixed-model ANOVA, including within-subjects factors of descriptor, oral sensation and occasion of testing, indicated that the ratings varied according to the occasion of testing (F1,1045 = 7.45, p = .006). Additionally, there was a significant interaction between the descriptor and the occasion of testing (F6,1045 = 2.11, p = .049). Inspection of the data indicated that smaller ratings were given, in general, to the less intense descriptors on the second occasion of testing. The interaction between oral sensation and occasion of testing approached significance (F3,1045 = 2.56, p = .054). The basis of this trend was in the lack of an overall effect of the occasion of testing on intensity descriptors for only the unpleasantness ratings. The geometric means of the 20 subjects’ ratings for each intensity descriptor, for each oral sensation type are shown in Table 2. Inspection of Table 2 suggests that the rank order of the descriptors, based on the ratings, occur as expected given their generally accepted meanings. The ‘slight’ and ‘weak’ descriptors are clearly very similar in the intensity of sensation they describe (also see Fig. 2). The absolute magnitudes of the intensity descriptors varied slightly, with the ratings of unpleasant sensations occupying an apparently wider range. However, the mixed-model ANOVA, as described above, indicated that although the descriptor ratings differed in value (F6,1045 = 688.0, p < .001), each was given the same ratings of intensity for sensations of wetness, dryness, pleasantness or unpleasantness (descriptor · oral sensation, F18,1045 = 0.64, p = .87). Pairwise tests of ratings for adjacent descriptors, for each oral sensation, indicated that the descriptors were well differentiated (p < .05 in each case, after correction for multiple comparisons), except for ‘slight’ and ‘weak’ which were only significantly different for ratings of pleasantness. The labeled magnitude scales generated by the 20 subjects in this initial analysis are shown in Fig. 3. The positions of the semantic labels for intensity were obtained by dividing the (geometric) mean rating for each intensity descriptor by the mean rating for the most intense descriptor (see Table 2). Although separate labeled magnitude scales are presented for each oral sensation, a single generic scale would be acceptable, as indicated by the analysis above. Following analysis of the balanced set of data from the 20 subjects, all of the available data were investigated. The analysis was split into separate groups for the wet/dry (n = 51) and pleasant/unpleasant data (n = 49). Initially, the presence of any effect of gender upon magnitude estimates was sought via a mixed-model ANOVA carried out upon the log magnitude estimates, including the factor of gender, and within-subjects factors of descriptor and oral sensation. The magnitude estimates were not modulus equalized for this analysis, so that the presence of any systematic differences between males and females in the size
346
S. Guest et al. / Food Quality and Preference 18 (2007) 342–352 Wettest imaginable A) WET Extremely wet Very wet Moderately wet Slightly wet Weakly wet Barely detectably wet
SUBJECT 1022
SUBJECT 1020
Driest imaginable B) DRY Extremely dry Very dry Moderately dry Slightly dry
Intensity descriptor
Weakly dry Barely detectably dry
Most pleasant imaginable C) PLEASANT Extremely pleasant Very pleasant Moderately pleasant Slightly pleasant Weakly pleasant Barely detectably pleasant
Most unpleasant imaginable D) UNPLEASANT Extremely unpleasant Very unpleasant Moderately unpleasant Slightly unpleasant Weakly unpleasant Barely detectably unpleasant
0
0.5
1 Log ME
1.5
2
0
0.5
1 Log ME
1.5
2
Fig. 1. Log magnitude estimates given to seven intensity descriptors on two separate sessions (solid line—first session, dashed line—second session), for two representative subjects.
of numbers chosen would not be hidden. The analysis indicated that gender overall did not influence the magnitude estimates for wet/dry (F1,917 = 1.63, p = .20) or pleasant/ unpleasant sensations (F1,891 = 0.01, p = .93). Additionally, none of the interactions involving gender attained significance, verifying that the ratings given to the individual intensity descriptors were the same overall for males and females.
Subsequently the modulus equalization procedure described earlier was carried out on all of the available data. In order to determine whether there were asymmetries in ratings given to the intensity descriptors for the wet versus dry and pleasant versus unpleasant oral sensations, separate descriptor · oral sensation mixed-model ANOVAs were performed on the wet/dry and pleasant/ unpleasant data sets. In each case, the ratings given to
S. Guest et al. / Food Quality and Preference 18 (2007) 342–352 Wettest / driest imaginable
WET
347
DRY
Intensity descriptor
Extremely Very Moderately Slightly Weakly Barely detectably
0
0.5
1
1.5
2
2.5
0
0.5
1
1.5
2
2.5
Log ME
Most pleasant / unpleasant imaginable
PLEASANT
UNPLEASANT
Intensity descriptor
Extremely Very Moderately Slightly Weakly Barely detectably
0
0.5
1
1.5
2
2.5
0
0.5
1
1.5
2
2.5
Log ME Fig. 2. Mean (log) magnitude estimates assigned to seven intensity descriptors for each of four different oral sensations. Error bars show the 25th and 75th percentiles.
Table 2 Geometric mean magnitude estimates (n = 20) given to seven different descriptors of oral sensation intensity, for four different oral sensations (wetness, dryness, pleasantness and unpleasantness) Intensity descriptor
Oral sensation
Wettest/driest or most pleasant/unpleasant imaginable Extremely Very Moderately Slightly Weakly Barely detectably
51.90 38.86 32.53 20.32 10.75 9.76 3.71
Wet
Dry (1.78) (0.97) (0.78) (0.51) (0.34) (0.44) (0.25)
51.62 39.99 32.92 21.10 12.71 9.43 4.33
(2.07) (1.25) (0.73) (0.57) (0.41) (0.37) (0.26)
Pleasant
Unpleasant
51.90 43.17 32.64 21.18 12.30 8.63 4.19
54.44 43.59 34.86 21.97 11.47 10.25 3.67
(1.67) (1.11) (0.68) (0.55) (0.37) (0.27) (0.26)
(1.72) (1.19) (1.02) (0.71) (0.34) (0.34) (0.27)
Standard errors in parentheses (Alf & Grossberg, 1979).
the different descriptors varied (wet/dry, F6,930 = 366.4, p < .001; pleasant/unpleasant, F6,904 = 417.9, p < .001). However, the lack of significant descriptor · oral sensation
interactions indicated that the ratings given to the labels did not vary according to the oral sensation (wet/dry, F6,930 = 0.49, p = .82; pleasant/unpleasant, F6,904 = 1.47,
348
S. Guest et al. / Food Quality and Preference 18 (2007) 342–352
WET
DRY
Wettest / driest imaginable
PLEASANT
UNPLEASANT
Most pleasant / unpleasant imaginable
Extremely
Very
Moderately
Slightly Weakly
Barely detectably
Fig. 3. Labeled magnitude scales for the oral sensations of wetness, dryness, pleasantness and unpleasantness as generated by 20 subjects, each of whom provided magnitude estimates for each of the semantic labels, twice.
p = .19). These results extend those obtained from analysis of the balanced data set from the 20 subjects, confirming that the ratings given to the descriptors were indeed the same within wet/dry or pleasant/unpleasant sensations. Table 3 shows the geometric mean magnitude estimates given to the different intensity descriptors; the magnitude estimates have been averaged across wetness and dryness (left column in Table 3) and pleasantness and unpleasantness (rightmost column). The labeled magnitude scales obtained from the ratings are shown in Fig. 4. Separate scales are shown for wetness, dryness, pleasantness and unpleasantness, although these could justifiably be combined into a single generic scale based on the analyses. In addition to the scales we developed (OPUS, OWDS), the
Table 3 Geometric mean magnitude estimates for seven semantic labels of the intensity of oral wetness or dryness, and oral pleasantness or unpleasantness Intensity descriptor
Wettest/driest or most pleasant/ unpleasant imaginable Extremely Very Moderately Slightly Weakly Barely detectably
Oral sensation Wet or dry (n = 51)
Pleasant or unpleasant (n = 49)
75.86
70.19
54.25 45.29 27.37 17.08 13.94 6.75
52.78 39.38 23.78 13.77 10.46 4.55
LMS (Green et al., 1993), LAM scale (Schutz & Cardello, 2001), SLIM scale (Cardello et al., 2005) and CALM scale (Cardello et al., 2003) are shown. For the LAM, SLIM and CALM scales, the semantic labels that best match those we used in the current study were chosen to label the scale. The LAM, SLIM and CALM scales all use very similar labels to those we used, with differences only in minor aspects of wording. For example, the topmost anchor in the LAM scale is ‘greatest imaginable like/dislike’ as compared to ‘most pleasant/unpleasant imaginable’ or ‘wettest/driest imaginable’ for our scales. The lower intensity labels differed even less between LAM and our scales. For example, ‘like/dislike extremely’ was used in the LAM scale, as opposed to ‘extremely wet/dry/pleasant/unpleasant’ in our oral scales. These trivial differences suggest that valid comparisons can be made of the label positions across the scales shown in Fig. 4. Inspection of the scales suggests a good degree of agreement between the LAM scale and the scales generated in the current study, the main difference being that label positions in the LAM scale are shifted slightly toward the lower end of the scale as compared with our oral sensation scales. Finally, any possible influence of the ability to taste PROP on the ratings given to the intensity descriptors was considered. For the 40 subjects for whom PROP ratings were available, (modulus equalized) log magnitude estimates were plotted against the PROP intensity ratings separately for each of the wet, dry, pleasant and unpleasant oral sensations (Fig. 5). In each of these four scatterplots, the magnitude estimates provided for the seven different
S. Guest et al. / Food Quality and Preference 18 (2007) 342–352 AFFECTIVE SCALES
C
MOUTH (OPUS)
FOOD (LAM)
TEXTILES (CALM)
D
E
WETNESS FULLNESS (OWDS) (SLIM)
PLEASANT
F LMS
8
B
SENSORY SCALES
8
A
349
Extremely Very
Very strong
Moderately
Strong
Slightly
Moderate Weak
Barely detectable
UNPLEASANT
Weak Moderate
Slightly Moderately
Strong Very strong
Very
8
8
Extremely
DRYNESS HUNGER
Fig. 4. Labeled magnitude scales generated for the rating of oral sensations from current study, oral pleasantness/unpleasantness scale (A; OPUS) and oral wetness/dryness scale (B; OWDS). Also shown are the LAM scale for rating food pleasantness (B; Schutz and Cardello, 2001), the CALM scale for rating clothing comfort (C; Cardello et al., 2003), the SLIM scale for rating perceived satiety (E; Cardello et al., 2005) and the LMS (F; Green et al., 1993). All scales have been aligned around their midpoint. The scales have been categorized as affective (A–C) or sensory (D–F). The black boxes to the right of scale C indicate the range of values encompassed by the means of each of the intensity descriptors for scales A, B and C. Error bars show 1SE for each scale descriptor, where the values are expressed as a proportion of the geometric mean of each intensity descriptor. The top/bottom most anchor (1) was slightly different in each case as follows: A, ‘Most (un)pleasant oral sensation imaginable’; B, C, E, ‘Greatest imaginable (un)pleasantness, (dis)comfort, fullness/hunger.’; D, ‘Wettest/driest oral sensation imaginable’; F, ‘The strongest imaginable oral sensation’.
semantic labels for intensity are shown using different symbols, along with linear fits for each of the labels. If PROP ratings (i.e., taster status) influenced the relative meaning of the intensity labels for different subjects, then the seven lines of best fit shown in each plot in Fig. 5 would deviate from parallel. In contrast, if the lines were parallel, but with a positive slope, this would indicate PROP sensitivity to be associated with the use of larger numbers by individuals who were relatively sensitive to PROP, but with no change in the relative intensive meaning of the semantic labels. In this case, the ratios of magnitude estimates would remain the same for all sensitivities to PROP. Fig. 5 suggests that the ability to taste PROP has no consistent influence on either the absolute size of numbers used, nor, more importantly, in the relative intensive meaning of the semantic labels. Overall, the linear fits are very close to parallel. Thus stronger perceptions of PROP bitterness did not lead to any consistent trends in the numbers used. 2.4. Discussion The development of labeled magnitude scales for rating the intensity of oral wetness, dryness, pleasantness and
unpleasantness provides scant evidence for four distinct scales, at least in terms of the position of the semantic labels for intensity. Instead, the semantic labels occur at the same position in all four scales, and even for a relatively large subject group, there is no evidence that the label positions show asymmetry when comparing wet with dry or pleasant with unpleasant. 2.4.1. Comparison of oral scales with other labeled magnitude scales Given that each of the four oral sensation scales have their semantic labels for intensity at the same positions, it is a matter of interest whether the other labeled magnitude scales that have been published also show very similar label locations. In other words, does only a single labeled magnitude scale exist, one scale that applies to all sensations that might be rated? In this respect, when the oral scales we developed are compared with the published labeled magnitude scales derived for other specific sensory experiences (Cardello et al., 2005, 2003; Schutz & Cardello, 2001) generally good agreement is seen, subject to certain caveats. Comparing the affective scales (Fig. 4A, B and C), the positive (pleasant) and negative (unpleasant) scale intensity label positions below the topmost anchor are in
350
S. Guest et al. / Food Quality and Preference 18 (2007) 342–352 A) Wetness
2.5
B) Dryness
2.5 Wettest imaginable Extremely
2
2
Magnitude estimate
Magnitude estimate
Very
1.5 Moderately
Weakly
1
Slightly
1.5
1
Barely detectable
0.5
0.5
0
0
0
0.2
0.4
0.6
0.8
1
0
0.2
PROP rating 2.5
2.5
C) Pleasantness
0.6
0.8
1
0.8
1
D) Unpleasantness
2
Magnitude estimate
2
Magnitude estimate
0.4
PROP rating
1.5
1
0.5
1.5
1
0.5
0
0 0
0.2
0.4
0.6
0.8
1
PRO Prating
0
0.2
0.4
0.6
PROP rating
Fig. 5. Modulus-equalized (log) magnitude estimates given to seven semantic descriptors of intensity, applied to the oral sensations of (A) wetness, (B) dryness, (C) pleasantness and (D) unpleasantness. Estimates are plotted as a function of the rated intensity of PROP bitterness. Different symbols denote the data for the different semantic labels, and a line of best fit is plotted through the points for each of the semantic labels. Note that five outliers are omitted from the plots (but are included in the lines of best fit); each outlier had a log magnitude estimate value of below zero.
approximately the same locations relative to each other. Agreement between scales appears best for the OPUS and CALM scales. In the hedonic scales developed in the current study (OPUS), each of the label positions is shifted toward the topmost label somewhat (ca. 20%) as compared with the corresponding label positions within the LAM scales. This is consistent with the topmost label signifying a less intense percept in our study. However, on the face of it, this possibility seems unlikely given that Schutz and Cardello’s (2001) most intense label included the most pleasant (or unpleasant) food sensation imaginable, whereas the equivalent label in the OPUS included all pleasant or unpleasant oral sensations (exclusive of pain). One would expect the latter to include sensations of comparable or greater intensity than those encompassed in study of Schutz and Cardello. Even though the magnitude estimation procedure of Schutz and Cardello did not explicitly exclude consideration of painful sensations from being included within each semantic label, the context of
the experiment (i.e., foodstuff perception) would presumably discourage subjects from considering painful sensations. We note also that Schutz and Cardello included more semantic labels, and particularly more relatively extreme labels than those shown in Fig. 4. Indeed, Schutz and Cardello suggest that this could provide a judgment context that influences ratings given to semantic labels of intensity. For example, if a magnitude estimation procedure includes very many semantic labels that describe very intense sensations, then subjects may tend to provide estimates that separate these labels. In effect the subjects are forced to provide more fine-grained judgments of the semantic labels when more labels are included. Nevertheless, given the differences in the experimental paradigms between Schutz and Cardello’s and our work, the agreements between scales are more compelling than the disagreements. Indeed, even if different scales do exist, their differences are generally small, and it is far from clear
S. Guest et al. / Food Quality and Preference 18 (2007) 342–352
whether any differences have their source in the different experimental populations, or subtle methodological variations, rather than in the genuine existence of more than one labeled magnitude scale. In contrast, it seems clear that the structure of the LMS is quite different to all of the other scales pictured in Fig. 4, perhaps because its topmost label includes pain within its sensory range. Thus it seems possible that there are two classes of labeled magnitude scales, those that include pain in their topmost anchor, and those that do not. This would indicate that the relative meaning of intensive terms is essentially consistent across different rated sensations, as long as those sensations are either anchored to pain, or are not anchored to pain. Previous work supports a general dichotomy between affective and intensive sensations, although a specific distinction between sensations inclusive or exclusive of pain has not been expressly made. For example, Gracely et al. (1978a) found that the most intense general sensory descriptor in a set of 15 was rated as 85 times more intense than the weakest descriptor, i.e., the sensory range was 85:1. Similarly, the ratio between most and least intense descriptors in Green et al.’s LMS is 69:1. In contrast, for classes of affective intensity descriptors, the sensory range has consistently been found to be much smaller, varying from 9:1 for positive affect in Schutz and Cardello’s LAM, through 10:1 (Gracely et al., 1978a) to 15:1 for our OPUS. One complication is that some, presumably sensory, scales have not found such a large sensory range as Gracely et al. (1978a) or Green et al. (1993). For example, our OWDS shows a sensory range of 11:1, and the SLIM scale shows a range of 13:1 for discomfort and 24:1 for comfort. However, it is well established that all non-affective sensations do not share a common maximal perceived intensity (Bartoshuk, 2000) and it is far from clear that intensity descriptors should have the same relative meaning for intensive sensations for scales that differ in maximal intensity. As such, the existence of two classes of labeled magnitude scale, those anchored to pain, and those not anchored to pain, remains feasible. It follows from the above argument that the LMS of Green et al. (1993) would be more valid than the OPUS for rating the unpleasantness of all oral sensations, inclusive of pain. Indeed, it would presumably not be possible to adequately represent very painful oral sensations using our oral unpleasantness scale, given that the greatest imaginable oral pain will certainly be greater than the greatest imaginable oral unpleasantness (exclusive of pain). 2.4.2. Generality of scales The practical utility of labeled magnitude scales requires that magnitude estimates of sensation intensity are relatively stable, within- and between individuals, and between experiments carried out at different locations. In respect of this issue, the good agreement between studies in the sensory range for affect, as noted above, suggests that the
351
relative magnitude estimates of the most and least extreme affective labels are reasonably consistent, and perhaps tolerant of different subject populations (see Gracely, McGrath, & Dubner, 1978b). The stability of magnitude estimates for given descriptors between experimental studies also appears reasonable, although direct comparisons between experiments are difficult given that different studies have used very different affective descriptors of intensity. For example, direct comparisons with the work of Gracely et al. (1978a) are difficult, given the distinctive labels used by Gracely et al. Extending the above line of thought, it is generally assumed that labeled magnitude scales generated by populations of subjects can be used without difficulty by individual subjects, i.e., it is tacitly assumed that the labeled magnitude scale for an individual varies minimally from an average scale. In particular, this assumes that the relative meanings of semantic labels for intensity are the same for all subjects. This assumption gains some support from our finding that taster status (i.e., PROP intensity ratings) was not associated with different intensity meanings of the semantic labels. That is, although individuals with different abilities to taste PROP have been shown to have rather different perceptual worlds in various ways (Duffy & Bartoshuk, 2000; Essick et al., 2003), nevertheless the meanings individuals attach to intensive semantic labels appear to be independent of their PROP sensitivity. Furthermore, although gender-related differences have been found in the scaling of some percepts, such as thermal stimulation (Harju, 2002; Lautenbacher & Rollman, 1993), lingual vibrotactile stimulation (Fucci, Petrosino, Schuster, & Wagner, 1990) and the perception of annoyance from short samples of music (Fucci, Petrosino, Hallowell, Andra, & Wilcox, 1997), we found no evidence for gender-related differences in the rating of semantic labels describing the intensity of oral wetness, dryness, pleasantness or unpleasantness. 2.4.3. Scale implementation One issue of interest with respect to implementation of the scales we developed is whether the scales should be displayed as unisensory (e.g., a pleasantness scale) or bisemantic, back-to-back scales (e.g., a pleasantness–unpleasantness scale). Although the magnitude estimates used to develop the four scales were obtained in separate experimental sessions, this does not preclude combination of the scales after their generation, especially given that similar contextual descriptors were seen in all sessions. In the food hedonics literature, both unisensory scales (Yeomans, Durlach, & Tinley, 2005) and bisemantic scales are used (Bartoshuk et al., 2002), and it is probably an empirical issue as to which arrangement of scales is more appropriate. One caveat here is that the use of a unisensory scale will be inappropriate if rated sensations could conceivably be experienced from the ‘missing’ scale. For example, Yeomans et al. (2005) used a unisensory pleasantness scale for rating teas and juices. If any of the rated items were
352
S. Guest et al. / Food Quality and Preference 18 (2007) 342–352
‘unpleasant’ then the subject could not represent their sensory experience adequately. Acknowledgement We wish to thank Kristin Morgan for her assistance with data collection. References Alf, E. F., & Grossberg, J. M. (1979). The geometric mean: Confidence limits and significance tests. Perception & Psychophysics, 26, 419–421. Bartoshuk, L. M. (2000). Comparing sensory experiences across individuals: Recent psychophysical advances illuminate genetic variation in taste perception. Chemical Senses, 25, 447–460. Bartoshuk, L. M., Duffy, V. B., Fast, K., Green, B. G., Prutkin, J. M., & Snyder, D. J. (2002). Labeled scales (e.g., category, Likert, VAS) and invalid across-group comparisons: What we have learned from genetic variation in taste. Food Quality and Preference, 14, 125–138. Borg, G. (1982). A category scale with ratio properties for intermodal and interindividual comparisons. In H.-G. Geissler & P. Petzold (Eds.), Psychophysical judgment and the process of perception (pp. 25–34). New York: North Holland Publishing Company. Borg, G. (1998). Borg’s perceived exertion and pain scales. Champaign, IL: Human Kinetics. Borg, E., & Borg, G. (2002). A comparison of AME and CR100 for scaling perceived exertion. Acta Psychologica, 109, 157–175. Cardello, A. V., Schutz, H. G., Lesher, L. L., & Merrill, E. (2005). Development and testing of a labeled magnitude scale of perceived satiety. Appetite, 44, 1–13. Cardello, A. V., Winterhalter, C., & Schutz, H. G. (2003). Predicting the handle and comfort of military clothing fabrics from sensory and instrumental data: Development and application of new psychophysical methods. Textile Research Journal, 73, 221–237. Diamond, J., & Lawless, H. T. (2001). Context effects and reference standards with magnitude estimation and the labeled magnitude scale. Journal of Sensory Studies, 16, 1–10. Duffy, V. B., & Bartoshuk, L. M. (2000). Food acceptance and genetic variation in taste. Journal of the American Dietetic Association, 100, 647–655. Essick, G. K., Chopra, A., Guest, S., & McGlone, F. P. (2003). Lingual tactile acuity, taste perception, and the density and diameter of fungiform papillae in female subjects. Physiology & Behavior, 80, 289–302. Fucci, D., Petrosino, L., Hallowell, B., Andra, L., & Wilcox, C. (1997). Magnitude estimation scaling of annoyance in response to rock music: Effects of sex and listener’s preference. Perceptual and Motor Skills, 84, 663–670.
Fucci, D., Petrosino, L., Schuster, S. B., & Wagner, S. (1990). Comparison of lingual vibrotactile suprathreshold numerical responses in men and women: Effects of threshold shift during magnitude-estimation scaling. Perceptual and Motor Skills, 70, 483–492. Gracely, R. H., McGrath, P., & Dubner, R. (1978a). Ratio scales of sensory and affective verbal pain descriptors. Pain, 5, 5–18. Gracely, R. H., McGrath, P., & Dubner, R. (1978b). Validity and sensitivity of ratio scales of sensory and affective verbal pain descriptors: Manipulation of affect by diazepam. Pain, 5, 19–29. Green, B. G., Dalton, P., Cowart, B., Shaffer, G., Rankin, K., & Higgins, J. (1996). Evaluating the ‘labeled magnitude scale’ for measuring sensations of taste and smell. Chemical Senses, 21, 332–334. Green, B. G., Shaffer, G., & Gilmore, M. M. (1993). Derivation and evaluation of a sematic scale of oral sensation magnitude with apparent ratio properties. Chemical Senses, 18, 683–702. Harju, E.-L. (2002). Cold and warmth perception mapped for age, gender, and body area. Somatosensory and Motor Research, 19, 61–75. ˚ ., & Ha¨gg, G. (1999). Perception Johansson, L., Kjellberg, A., Kilbom, A of surface pressure applied to the hand. Ergonomics, 42, 1274– 1282. Kamphuis, M. M. J. W., & Westerterp-Plantenga, M. S. (2003). Prop sensivitity affects macronutrient selection. Physiology & Behavior, 79, 167–172. Lautenbacher, S., & Rollman, G. B. (1993). Sex differences in responsiveness to painful and non-painful stimuli are dependent upon the stimulation method. Pain, 53, 255–264. Marks, L. E., Borg, G., & Ljunggren, G. (1983). Individual differences in perceived exertion assessed by two new methods. Perception & Psychophysics, 34, 280–288. Marks, L. E., Borg, G., & Westerlund, J. (1992). Differences in taste perception assessed by magnitude matching and by category-ratio scaling. Chemical Senses, 17, 493–506. Noakes, T. S. (2002). Lore of running. Champaign, Illinois: Human Kinetics. Peryam, D. R., & Pilgrim, F. J. (1957). Hedonic scale method of measuring food preferences. Food Technology, 11, 9–14. Schutz, H. G., & Cardello, A. V. (2001). A labeled affective magnitude (LAM) scale for assessing food liking/disliking. Journal of Sensory Studies, 16, 117–159. Stevens, S. S. (1971). Issues in psychophysical measurement. Psychological Review, 78, 426–450. Stevens, S. S., & Poulton, E. C. (1956). The estimation of loudness by unpracticed observers. Journal of Experimental Psychology, 51, 71–78. Teghtsoonian, R., Teghtsoonian, M., & Baird, J. C. (1995). On the nature and meaning of sinuosity in magnitude-estimation functions. Psychological Research, 57, 63–69. Yeomans, M. R., Durlach, P. J., & Tinley, E. M. (2005). Flavour liking and preference conditioned by caffeine in humans. Quarterly Journal of Experimental Physiology, 58B, 47–58.