OUR INDUSTRY
TODAY
Psychophysicai Aspects of Sensory Analysis of Dairy Products: A Critique MICHAEL O'MAHONY Oepartment of Food Science and Technology University of California Davis 95616
INTRODUCTION This article is to highlight some of the psychophysical principles in the use and abuse of the common types of sensory testing procedures with human judges: intensity scaling, difference testing, and descriptive analysis. Because it has been done skillfully elsewhere, it is not intended to give a guide to laboratory procedures (1, 26) nor preparation and analysis of reports (28).
SOME ASPECTS OF DESIGN AND ANALYSIS Before a detailed analysis of sensory tests it is pertinent to state a few general principles which all too often are ignored during sensory analysis. The points are dealt with briefly because there are several excellent detailed sources to which the interested reader can refer to pursue the arguments further (1, 27). A sensory panel is a flavor-measuring instrument. It consists of judges who use specified procedures for assessing flavors of foods or, more specifically, dairy products. The data from a sensory panel have the status of data from any laboratory instrument. To ask the panel (say a half dozen judges) whether it likes the product would be rather like asking a gas chromatograph whether or not it likes the product; it was not designed for this purpose. Yet this is often done. The likes or dislikes of a few panel members are a poor measure of consumer acceptance and possible performance of a food in the marketplace. Judges, although isolated in booths from the distraction of surrounding stimuli and the experimenter, still tend to have built-in preferences for numbers or symbols. Thus, when samples must be identified to the experimenter, simple identifications like A, B, C, D . . . or 1, 2, 3, 4 . . . are unsatisfactory because the
Received June 6, 1979. 1979 J Dairy Sci 62:1954-1962
judge's number and symbol preferences may bias any choice of samples. Three-digit random numbers have a low probability of favor bias and are more suitable for coding. Position bias also operates, so the position of samples on the serving tray should be varied randomly. If a judge were trying to pick the odd sample out of three and the odd sample was always on the left, it would take only a few trials for the judges to discover this and cheat the test. Sometimes judges do this without realizing. If several milk products are to be compared by a panel, the panel should remain the same throughout the comparison. Judges who do not attend should not be replaced by substitutes; such panelists should be dropped from the comparison. Substituting panelists is equivalent to altering any instrument during a study; it changes the calibration of the instrument. Further, it invalidates any statistical analysis which demands that all judges must be the same for all milk products. If this cannot be done, then the statistical limitations demand that different judges should be used for each condition. While dealing with statistical analysis, average panel data do not necessarily reflect the data obtained from each individual judge (27). Although the point is self evident, it is surprising how often deductions about individuals are made from group panel data. In sequential testing, the order of testing of the products, where possible, should be counterbalanced to allow for any sequential variables like practice or fatigue. If this cannot be done, care must be taken to eliminate such variables as much as possible. The counterbalancing or elimination of all possible extraneous variables is a basic tenet o f any experimental design.
INTENSITY SCALING: SOME PRELIMINARIES Before discussing scaling in detail, it is worth first establishing a language for discussing the
1954
OUR INDUSTRY TODAY status of numbers (33). Four classes of numbers can be defined: those on a nominal, ordinal, interval, or ratio scale. On a nominal scale numbers are used merely to denote names or categories like numbers on footballers' shirts. Labeling milk preparations with different additives, Preparation 1, 2, and 3 would constitute a nominal scale. In this case the numbers do not denote quantities at all; they are merely names which distinguish categories. On an ordinal scale numbers are used, not to denote actual magnitudes but to denote a rank order. Thus, for numbers on an ordinal scale the difference between 4th and 6th is rarely the same as that between 2nd and 4th. On a ratio scale, numbers are used to denote actual quantities. The difference between 4 and 6 is the same as the difference between 2 and 4 and a score of 200 is twice as great as 100. Ratio scales are common, for example: length, weight, height, density, time, speed, etc. An interval scale has the numerical properties of a ratio scale except that it does not have a real zero. Although the difference between 4 and 6 is the same as the difference between 2 and 4, a score of 200 is not twice as great as 100. Interval scales are comparatively rare. The year anno domini is an interval scale of the age of this planet; in the year 2000 A.D. the world will not be twice as old as it was in the year 1000 A.D. Likewise temperature in degrees Celsius or farenheit is an interval scale of heat content of a body. The terms nominal, ordinal, interval, and ratio scales should be seen as concepts providing a language for discussing the status of numbers. Actual data obtained in a sensory experiment may not fit neatly into any one of these categories but at least they may be discussed in terms of them. The status of numbers becomes important in the statistical analysis of data from sensory evaluation of dairy products. The more powerful statistical tests, called parametric tests (13, 35) make distribution assumptions; they assume that scores come from populations of scores that are distributed according to normal (Gaussian) curves. For this to be so, the numbers themselves must be on interval or ratio scales; they must represent actual quantities. Also, when samples from more than one population are compared, there are often assumptions about
1955
the variance of such populations being equal. Commonly used parametric tests are Analysis of Variance, t-test, and Pearson's productm o m e n t correlation. Ranked data (ordinal scale) or categories (nominal scale) cannot be analyzed in this way; the less powerful nonparametric techniques must be used (11, 31). Common nonparametric techniques are Friedman's or the KruskalWallis Analysis of Variance, Chi-Square, Wilo coxon, Mann-Whitney, Binomial tests, and Spearman's or KendaIl's correlation coefficients. Some researchers, however, prefer, to use a parametric analysis even when the assumptions about the data are violated. They argue that tests such as Analysis of Variance are sufficiently robust to allow a certain amount of violation. It is difficult to estimate exactly how far the assumptions may be or are being violated (e.g., in intensity scaling) so the whole argument about parametric or nonparametric analysis of scaled data remains controversial. INTENSITY SCALING PROCEDURES
Having discussed the status of the data from sensory tests and their corresponding statistical analyses, it is now appropriate to discuss the scaling methods themselves. Intensity scaling of dairy product flavors is a means of using the human judge to measure the amount or strength of a particular flavor. This enables some numerical comparison of the flavors of different products. Such a task poses a problem and many different procedures have been developed to solve it. There are many schemes available for examining different scaling methods; the one followed here is selected only because it is convenient for the present discussion. For all scaling techniques,judges are required to give some numerical measure of the strength of a particular characteristic for a group of food samples. Sufficient differences in these numerical responses noted by the judges in the panel is taken to indicate real or significant differences among the products according to that flavor characteristic. What entails a sufficient difference is determined by statistical analysis, parametric or nonparametric, depending on the status of the data. The numerical responses from the judges can be obtained in many ways, and one of the more controversial aspects of Journal of Dairy Science Vol. 62, No. 12, 1979
1956
O'MAHONY
scaling is whether the data obtained fall on a ratio, interval, or ordinal scale. The most simple form of scaling is ranking. Products are merely ranked by judges according to some agreed and defined characteristic, e.g., degree of heated flavor for milk samples. The data are ordinal and thus susceptible to analysis b y nonparametric statistics. Category scaling is a simple technique; judges are required to place the intensity response on a monotonically increasing scale (see Figure la). Care must be taken not to use too few categories or else the scale will not differentiate well enough. Too many categories should also be avoided because the categories would not really be different. Psychophysical research suggests that the o p t i m u m number of categories that a human can simultaneously manipulate is of the order of seven plus or minus two (17). The eight point scale (Figure l a ) has every point labeled; some versions just label the end categories. To develop this approach even further, the categories could he replaced by a continuous line; such a scale is called a graphic scale. The judge indicates the intensity of the quality under consideration by placing a mark on a line (Figure l b ) . In all cases such scales are prone to 'end effects'; judges are reluctant to use the ends of the scale, confining their responses to the central point. In this case the scale can hardly be called equi-interval. A given distance along the scale in the center would represent a smaller intensity difference than the same distance at the end of the scale. End effects may be dealt
a)
with by extending the scale b e y o n d the range to be used. If a 7-point scale is required, a 9-point scale may be presented to the judge who then would not be reluctant to use seven categories fully. The graphic scale can be extended as in Figure lc. The effectiveness of this method of combating end effects and any possible distortions that may occur as side effects has not been researched fully. Again the question arises as to whether the data from such scaling procedures are ordinal, interval, or ratio. To assume that equal distances along the scale represent equal sensory differences (interval or ratio scale) may be giving the scaling technique too much credit. Unfortunately, the direct research needed to answer such a question still remains to be done. The importance here is whether parametric or nonparametric statistical analysis should be used. Generally the more powerful parametric analysis (AOV) is used, hut doubts about whether judges may be using the scale as an interval scale must call this practice into question. It may be that in some cases the assumptions required for parametric analysis are justified while in others they are not. Which analysis to use must be the decision of the experimenter. A conservative researcher who is looking for differences between products will stack the odds against himself and use a less powerful nonparametric test to analyze the data for differences; if they are found, then they must be fairly substantial. Naturally if nonparametric tests detect differences, the more powerful parametric tests are likely to do also.
1
2
3
4
5
6
7
8
extremely low
very low
low
slightly low
slightly high
high
very high
extremely high
b) extremely [ low
] extremely high
c)
I extremely low Figure 1. Examples of intensity scales. Journal of Dairy Science Vol. 62, No. 12, 1979
I extremely high
OUR INDUSTRY TODAY It may be possible to ensure the equal interval nature of the scale by anchoring points on the scale to known standards. How successful this technique may be is a matter for further research. There have been several attempts by psychophysicists to develop techniques for generating interval or ratio scales of intensity so as to be able to study the precise relationship between physical and sensory intensity (5, 14). One such method which has been gaining in popularity recently is the method o f Direct Magnitude Estimation (32, 33). The judge is presented with a standard product, the intensity o f which for a given characteristic is defined as, say, ten. Other products are given to the judge who rates them according to the characteristic under question. Should they be twice as strong they are given a score o f 20, three times as strong -30, half as strong - 5, etc. The technique can be modified so that the standard is given 100 or no particular score at all with a computer being used to standardize the scores of all the judges. Again, although this scaling method was devised to provide a ratio scale, doubt still remains. Informal reports from judges, during some scaling studies (21), stated that they used the technique more as a category scale than as the ratio scale it is claimed to be. It is a matter of debate, once again, whether such a scaling procedure should be analyzed by parametric statistics.
SOME COMMON SCALING ERRORS
Every judge will use a particular intensity scale in his own particular way, introducing his own idiosyncratic distortions. This renders any comparison of data between judges of little value. Intensity scores for dairy products should be compared for the same judge or group of judges; if the groups were altered by substitution of judges, it would invalidate such comparisons. Intensity scaling requires that judges rate the intensity or strength of a given characteristic of a product. Such ratings may be correlated with the physical parameters of the product so that the relationship between the physical properties of the dairy product and its flavor may be studied. Hedonic scaling requires judges to rate how much they like a given product. This has no necessary relationship to intensity. Some
1957
judges may prefer stronger flavors, others weaker flavors, while still others may prefer intermediate strengths. Just as people vary in their likes and dislikes for colors, music, and books, so it is for flavors. Thus, to look for a single general relationship between degree of liking and subjective flavor intensities, for all judges, is a futile exercise. By the same token it is futile to look for such a relationship between degree of liking and the objective properties of a product. To hope to find a strong correlation between the degree o f liking of ice cream rated on a nine point hedonic scale and sugar content may be to hope in vain. It is to be expected that people differ in how sweet they like their ice cream. It is also of little use to a manufacturer to use hedonic ratings from a small panel, o f say ten judges, as a representative sample of consumers. A rather bizarre error common with intensity scaling is to measure flavor intensity on an inappropriate scale. To measure degree of oxidized flavor in a milk on a nine point scale ranging from "like e x t r e m e l y " to "dislike extremely" is to use a hedonic scale, not a scale of degree of oxidized flavor; to ask how much someone likes a milk is not to ask how much oxidized flavor they perceive. In the same way, a scale ranging from "highly acceptable" to "highly unacceptable" elicits judgments regarding the milk's acceptability and a scale ranging from "excellent" to "terrible" elicits judgments of the quality of the product. Degree of liking, acceptability, and quality are all different concepts, and none can be substituted for degree of oxidized flavor. The only way to measure this is on a scale ranging from "highly oxidized" to "highly nonoxidized". This point may seem trivial, but it is ignored all t o o often. Another common mistake is to compound more than one flavor quality onto a single scale. To take the example of the oxidized flavor of milk, such aspects as " p a p e r y " , "metallic", or " t a l l o w y " should be judged separately; it would be confusing to combine them all on one scale, such as: 0 - none; 1 - papery, cardboard; 2 metallic; 3 - tallowy; 4 - fishy. Here the quality and quantity of the oxidized flavor are confused. Surely, "metallic" is not half "fishy" nor could it be said that " p a p e r y " + "metallic" = "tallowy". Such a scale is merely a collection of different categories, anominalscale. Journal of Dairy Science Vol. 62, No. 12, 1979
1958
O'MAHONY
It may be argued, however, that different flavors may be experienced in sequence as oxidation develops and that flavor quality may be indicative o f the degree of oxidation (ordinal scale). In this case, it is acceptable for judges who are experienced with such flavors to identify them by the relevant procedures for flavor description, although difficulties will be encountered with flavors which are borderline between two categories. Whatever the procedures used, the assignment of scores to such categories is purely arbitrary and misleading. A more subtle confusion is to mix an intensity and a hedonic scale. This has been done with the following: 0 - none; 1 - questionable; 2 - slight; 3 - distinct, objectionable; 4 strong, very objectionable. The points 0 to 2 appear to be part of an intensity scale, but points 3 and 4 are given an extra hedonic value by requiring judgments about whether the product is objectionable. Thus, one end of the scale is an intensity scale while the other end is part intensity and part hedonie - a confusing scale, indeed. Further, the numbers on such a scale should not be treated as interval or ratio data because 0 to 2 constitutes an ordinal scale of intensity and 3 to 4 an ordinal scale of intensity mixed with hedonics. This poses a dire problem for statistical analysis. A further scale that has been used for measuring the intensity of oxidized flavor is: 0 none; 1 - questionable; 2 - slight; 3 definite or distinct; 4 - strong. Such a scale increases monotonically, but it is unlikely that the scoring intervals would be spaced equally to give a ratio or interval scale; most likely it is an ordinal scale and should be treated as such although, as stated earlier, this point is controversial. Which is the best" method of intensity scaling? "Best" may be defined as most accurately representing the sensations experienced by the judge; unfortunately there is no way of knowing, other than by asking the judge. The only way available for this is to use an alternative intensity scaling method; thus, the question becomes circular. However, there are still important questions that are in urgent need of research such as how reproducible scales may be between judges, how simple they are to use, and what their mathematical properties m a y be. -
Journal of Dairy Science Vol. 62, No. 12, 1979
DI FFERENCE TESTS
Difference tests are used to determine whether two products have a different flavor. A commonly used test is the Paired Comparison, where two products are presented to judges who have to determine which is which. One m a y be a normal milk sample while the other may be milk to which some sugar has been added. The judge is asked which sample is sweeter. If he is able to determine the sweeter sample a sufficient number of times, it is concluded that he can tell the difference between the flavors of the two samples. A sufficient number o f times is determined statistically by a nonparametric test because the data are on an ordinal scale ("less sweet", "sweeter"); an appropriate test is the Binomial Test (1, 31). In the same way a Binomial test can be used to determine whether an appropriate number of judges could tell the difference between the two samples. The question posed for the judge is: "Which milk is sweeter?" This is a so-called forcedchoice procedure because the judge is forced to choose one of the samples. If instead, the judge was asked whether the two samples were different, he would be put in the position o f having to decide how different samples have to be before they are reported as different. Such a decision, although apparently trivial, will be made so inconsistently as to render the test unreliable. However, if the judge is asked which sample is sweeter, a forced choice, he effectively is being told that the samples are different and so is relieved of the responsibility of having to decide this; he merely has to indicate the sweeter sample. Psychophysicists often express this concept in different terms. They see the judge as setting a "criterion of difference", a degree of difference such that if it is exceeded, a difference is reported; if not, no difference is reported. This criterion has little to do with the judge's sensitivity itself; it is a function of the judge's mood and the experimental conditions. Criterion variation is an important factor in sensory measurement, one that has caused many difficulties. The snag with the Paired Comparison procedure is that the judge must know what is meant by the term "sweeter". This is not too difficult, but more complex terms like those used to describe off-flavors in milks (30) may not be understood so readily. To get over this
OUR INDUSTRY TODAY difficulty there are alternatives to the Paired Comparison test. One is the Duo-Trio Test whereby one of the two samples in a pair comparison is presented first, and the judge is required to indicate which sample in the pair comparison is the same as the prior presented sample. Like the Paired Comparison test, the Duo-Trio test is a forced choice test and so copes with criterion problems. Furthermore, the judge does not have to learn an exact definition of a word to describe the difference that may occur. The Dual Standard Test is a similar elaboration on the pair comparison test whereby both samples in the pair comparison are presented beforehand rather than just one. Again the test copes with language and criterion problems. A further technique is the Triangle Test, where three products are presented to the judge, two of which are the same, one of which is different. The task of the judge is to select the odd one. Again the procedure is a forced choice procedure and does not depend on correct definition by language of descriptive terms. An extension of this procedure is the Multiple Comparison Test where the judge is required to sort a number o f samples into two groups. The Harris-Kalmus Test (10), used commonly for the investigation of taste blindness (12), is perhaps the most common; here the judge has to sort eight items into two groups o f four. Again the appropriate statistical treatment for such tests is the Binominal test. For the Triangle Test, the probabilities are adjusted because the probability of picking the correct sample by chance is one in three rather than one in two (1, 31). Compared to the Paired Comparison procedure, the disadvantage of the Duo-Trio and Triangle Tests, and even more so of the DualStandard and Multiple Comparison Tests, is that more samples need to be tasted. This involves more experimental time, especially if there are many replications. Further, the Paired Comparison Test often appears more sensitive to differences and less prone to "fatigue"; both these effects may have something to do with residual amounts o f samples from prior tastings remaining in the saliva and desensitizing the taste receptors (19, 20, 21), b u t this is a matter for investigation. A new development in difference testing is the use of Signal Detection Measures; these have been used for examining rich flavor in ice
1959
cream (34). The measures were developed in the sixties as a m e t h o d of getting around the criterion problem (6). The approach of Signal Detection is to allow the judge to perform a series of difference judgments, each one at a different criterion level, and from the results to compute a measure of the degree of difference. Essentially, the judge merely states whether there is a difference and whether he is sure of his judgment. Such a simple response from the judge can yield a measure of degree of difference, called P(A). This is equivalent to a probability, the probability o f correctly distinguishing two products should they be given in a Paired Comparison. Such a numerical value has the advantage of being susceptible to analysis by the more powerful parametric statistics (t-tests, Analysis of Variance). A disadvantage o f this Signal Detection measure, however, is the large number of replications that are required for its determination. This difficulty has been met in psychophysics with an index of difference called the R-index (3) which is equivalent to the P(A) measure and can be applied readily to the sensory analysis o f foods. Furthermore, it can be calculated from a whole array of responses from judges (23, 25) and can be used to measure simultaneously the degree of difference of several products from a standard (24).
D E S C R I P T I V E TESTS
Often it is useful to have an accurate description of the flavor of a food product. When direct comparison of products is impossible, direct comparison of standardized descriptions is the next best thing. The principal problem with description of flavors is alack of vocabulary. English is a language based on visual experience. Whereas we have a rich vocabulary for visual effects like color, we have a poor language for other senses like taste or smell. Even preschool children are much more capable o f identifying common colors than c o m m o n tastes (22). This is probably a result of the common habit o f parents o f spending a great deal of time teaching their children color names. In our culture it appears that parents spend little time teaching children to fit common taste descriptions to the appropriate stimuli. Even the terms "sour" and " b i t t e r " commonly are confused in English (7, 15, 16, 29), and recent research Journal of Dairy Science Vol. 62, No. 12, 1979
1960
O'MAHONY
shows this is a function of not having learned to label the appropriate sensation with the appropriate description (8). Interestingly enough, other cultures have different taste confusions, apparently as a function of their dietary habits (4, 18). Given the absence of a satisfactory language for flavor terms for dairy products, let alone any food product, one has to be invented (30). The most natural way to learn a language for flavors is to use the same strategy that is used for color, repeated practice in matching the appropriate descriptive term to a standard sensation (9). The usual technique is to train judges to use descriptive terms appropriate to the food under consideration, and to provide standards appropriate to those terms so that they should all agree on the definition of these terms. The trained judges then taste the food and assess the flavor intensity for each of the various descriptive terms. A record of the descriptions or "profile" of the food product then is kept for reference. There are various modifications possible for this procedure, some o f which are available from consultancy companies like the Tragon Corporation in Palo Alto, Arthur D. Little Co., Boston, and MPi Sensory Testing, New York. Thus, descriptive terms used to describe oxidized flavors in milk like " p a p e r y " , "cardboard", or " o i l y " all need standards to define the sensation they represent. If this is not done, the descriptions from various panels cannot be compared.
SCORE CARDS
It is common among dairy scientists to assess the flavor of dairy products by flavor scorecards. The ADSA scorecard (2) is an example (see Table 1). A dairy product like milk is scored for a range of off flavors: "astringent", " b a r n y " , " c o o k e d " , " c o w y " , "oxidized", "metallic", "rancid", etc.: The intensity of each off-flavor is scored as "slight", "definite", or "pronounced". This is an ordinal scale, so any statistical comparison of the sensory properties of milk samples by these scales would necessitate a nonparametric analysis. The scales have been developed a stage further. A body of experts has assessed the relative importance of these off-flavors in terms Journal of Dairy Science Vol. 62, No. 12, 1979
TABLE 1. Excerpt from the ADSA scorecard sensory evaluation of milk.
for
Slight
Definite
Pronounced
Astringent Barny Bitter Cooked Cowy Feed Flat Foreign Garlic/Onion High acid Lacks freshness Malty Metallic Oxidized
8 5 5 9 6 9 9 5 5 3 8 5 5 6
7 3 3 8 4 8 8 3 3 1 7 3 3 4
5
Rancid
4
1
Salty Unclean
8 3
6 1
1
1 6 1 5 7 1 1 ...a 6 1 1
1 .
4" ...
a . . . Denotes unsalable. of the quality of the milk. The experts' opinion of what constitutes a good quality or a bad quality milk may be based on knowledge of sales, consumer complaints, or their own preferences within that culture. Despite possible arbitrariness, it is worth remembering that expert opinion is a common measure of quality in our society; it is accepted as a measure of the quality for wines, spirits, gourmet meals, etc. To represent the importance for milk quality of these off-flavors, the degree o f off-flavor is given an arbitrary score depending on whether it is "slight", "definite", or "pronounced". These scores are in Table 1 ; the lower the score, the more pronounced the off-flavor. Thus, in the opinion of the experts a "slight metallic" flavor is as detrimental to milk quality as a "pronounced astringent" flavor because they both have the same score of five. Milk with pronounced high acid, rancid, or unclean flavor is considered unsalable. The actual scores given are arbitrary, ranging from 10, for milk that is not perceived as having an off-flavor, to 1, the lowest score. A typical score may be "score 6 due to a slight oxidized flavor". This procedure is an attempt to convert a score for the degree of a given off-flavor to a judgment of the milk's quality on subjective expert opinion. Information is used from only one scale and so the procedure is less informative regarding offflavors than would be a regular flavor profiling
OUR INDUSTRY TODAY t e c h n i q u e . Statistically it still has t h e s t a t u s o f a score o n o n e specific o r d i n a l scale a n d s h o u l d b e t r e a t e d as such. T o s u b j e c t s u c h scores t o a n y f o r m o f p a r a m e t r i c statistical analysis, even f i n d i n g a m e a n , is u n j u s t i f i e d . C e r t a i n l y t o p e r f o r m a statistical o p e r a t i o n o n scores all derived from d i f f e r e n t scales ( " c o o k e d " , "metallic", " r a n c i d " , etc.) is n o n s e n s i c a l ; " p r o n o u n c e d c o o k e d " c a n n o t b e averaged w i t h "slight m e t a l l i c " . U n f o r t u n a t e l y such analyses are s o m e t i m e s p e r f o r m e d . SUMMARY
A l t h o u g h t h e r e is still a great n e e d for m e t h o d o l o g i c a l research a n d still s o m e u n s o l v e d c o n t r o v e r s i e s a b o u t t h e s t a t u s o f data, t h e r e is a r a n g e o f t e c h n i q u e s available for t h e s e n s o r y analysis o f d a i r y p r o d u c t s . T h e s e c a n b e divided i n t o t h r e e m a i n p r o c e d u r e s : i n t e n s i t y scaling, d i f f e r e n c e testing, a n d descriptive analysis. T h e y all have t h e i r p a r t i c u l a r l i m i t a t i o n s , b u t if t h e s e are b o r n e in m i n d , t h e t e s t s c a n b e used effectively. However, all t o o o f t e n such t e s t s have b e e n a s o u r c e o f m i s i n f o r m a t i o n in d a i r y research. I n a p p r o p r i a t e tests h a v e b e e n used, t h e y have b e e n applied in s u c h a w a y t h a t p s y c h o l o g i c a l bias has b e e n i n t r o d u c e d , a n d t h e y have b e e n a n a l y z e d w i t h o u t d u e regard t o t h e statistical l i m i t a t i o n s i m p o s e d b y t h e p r o c e d u r e for o b t a i n i n g t h e data. G r e a t e r care s h o u l d b e t a k e n in selecting a p p r o p r i a t e t e s t i n g p r o c e d u r e s a n d statistical analyses for use in t h e s e n s o r y analysis o f dairy p r o d u c t s . REFERENCES
1 Amerine, M. A., R. M. Pangborn, and E. G. Roessler. 1965. Principles of sensory evaluation of food. Academic Press, New York. 2 Anonymous. 1976. Suggested flavor, body and texture, and appearance and color scores with designated intensities of flavor defects for 1976 Intercollegiate Dairy Products Evaluation Contest, Mimeo, 3 pp. 3 Brown, J. 1974. Recognition assessed by rating and ranking. Brit. J. Psychol. 65:13. 4 Chamberlain, A. F. 1903. Primitive taste words. Amer. J. Psychol. 14:146. 5 D'Amato, M. R. 1970. Experimental psychology; Methodology, psychophysics and learning. McGrawHill, New York. 6 Green, D. M., and J. A. Swets. 1966. Signal detection theory and psychophysics. Wiley, New York. 7 Gregson, R.A.M., and A.F.H. Baker. 1973. Sourness and bitterness: confusions over sequences of taste judgements. Brit. J. Psychol. 64:71. 8 Goldenberg, M. 1978. Linguistic confusion in
1961
flavor analysis. MS thesis, University of California, Davis. 9 Hammond, E. G., and R. G. Seals. 1972. Oxidized flavor in milk and its simulation. J. Dairy Sci. 55:1567. 10 Harris, H., and H. Kalmus. 1949. The measurement of taste sensitivity to phenylthiourea (PTC). Ann. Eugenics 15 : 24. 11 Hollander, M., and D. A. Wolfe. 1973. Nonparametric statistical methods. Wiley, New York. 12 Kalmus, H. 1971. Genetics of taste. Pages 165 to 179 in Handbook of sensory physiology. Vol. 4. Chemical senses, part 2: Taste. L. M. Beidler, ed. Springer-Verlag, Berlin. 13 Keppel, G. 1973. Design and analysis: A researcher's handbook, Prentice-Hall, Englewood Cliffs. 14 Kling, J. W., and L. A. Riggs. 1971. Woodworth and Scholsberg's experimental psychology. 3rd ed. Holt, Rinehart, and Winston, New York. 15 McAuliffe, W. K., and H. L. Meiselman. 1974. The roles of practice and correction in the categorization of sour and bitter taste qualities. Perception and Psychophysics 16:242. 16 Meiselman, H. L., and E. Dzendolet. 1967. Variability in gustatory quality identification. Perception and Psychophysics 2:496. 17 Miller, G. A. 1956. The magical number seven, plus or minus two, or some limits on our capacity for processing information. Psychol. Rev. 63:81. 18 Myers, C. S. 1904. The taste-names of primitive peoples. Brit. J. Psychol. 1:117. 19 O'Mahony, M. 1974. Taste adaptation: The case of the wandering zero. J. Food Technol. 9:1. 20 O'Mahony, M., and L. Godman. 1974. The effect of interstimulus procedures on salt taste thresholds. Perception and Psychophysics 16:459. 21 O'Mahony, M., and P. Wingate. 1974, The effect of interstimulus procedures on salt taste intensity functions. Perception and Psychophysics 16:494. 22 O'Mahony, M., J. Autio, C. Heintz, and M. Goldenberg. 1978. Taste naming by preschool children compared to colour naming: preliminary examination. IRCS Med. Scl. 6:208. 23 O'Mahony, M., and M. Davies. 1978. A signal detection approach to taste difference testing between two levels of alcohol in a flowise presented sherry stimulus. IRCS Med. Sci. 6:189. 24 O'Mahony, M., C. Heintz, and J. Autio. 1978. Signal detection difference testing of colas using a modified R-index approach. IRCS Med. Sci. 6:222. 25 O'Mahony, M. 1979. Short cut signal detection measurements for sensory analysis. J. Food Sci. 44: 302. 26 Pangborn, R. M., andW. L. Dunkley. 1964. Laboratory procedures for evaluating the sensory properties of milk. J. Dairy Sci. 26:55. 27 Pangborn, R. M. 1979. A critical analysis of sensory responses to sweetness. Proc. NJF Sympos. on Carbohydrate Sweeteners. P. Koivistoinen, ed. Academic Press, New York. 28 Prell, P. A. 1976. Preparation of reports and Journal of Dairy Science Vol. 62, No. 12, 1979
1962
O'MAHONY
manuscripts which include sensory evaluation data. Food Technol. 30:40. 29 Robinson, J. O. 1970. The misuse of taste names by untrained observers. Brit. J. Psychol. 6: 375. 30 Shipe, W. F., R. Bassette, D. D. Deane, W. L. Dunkley, E. G. Hammond, W. J. Harper, D. H. Kleyn, M. E. Morgan, J. H. Nelson, and R. A. Scanlan. 1978. Off flavors in milk: Nomenclature, standards and bibliography. J. Dairy Sci. 61:855. 31 Siegel, S. 1956. Nonparametric statistics for the behavioral sciences. McGraw-Hill, New York.
Journal of Dairy Science Vol. 62, No. 12, 1979
32 Stevens, S. S. 1957. On the psychophysical law. Psychol. Rev. 64:153. 33 Stevens, S. S. 1961. The psychophysics of sensory function. Sensory communication. W. A. Rosenblith, ed. Wiley, New York. 34 Stull, J. W., R. C. Angus, R. R. Taylor, A. N. Swartz, and T. C. Daniel. 1974. Rich flavor discrimination in ice cream by theory of signal detection. J. Dairy Sci. 57:1423. 35 Winer, B. J. 1971. Statistical principles in experimental design. 2nd ed. McGraw-Hill, Kogakusha, Tokyo.