Evaluation of R-indices for preference testing of apple juices

Evaluation of R-indices for preference testing of apple juices

Food Qualily and Prefemce Vol. 8, No. 3, pp. 241-246, 1997 Crown Copyright 0 1997 Elsevier Science Ltd All rights reserved. Printed in Great Britain ...

636KB Sizes 0 Downloads 68 Views

Food Qualily and Prefemce Vol. 8, No. 3, pp. 241-246, 1997 Crown Copyright 0 1997 Elsevier Science Ltd All rights reserved. Printed in Great Britain

0950-3293/97 s17.oo+.oo

PII:SO950-3293(96)00056-O

ELSEVIER

EVALUATION OF R-INDICES FORPREFERENCETESTING OF APPLEJUICES M.A. Cliff,a* M. C. King,aC. Scaman*& El.J. EdwardsC ‘Agriculture and Agri-Food Canada, Pacific Agri-Food Research Centre, Summerland, BC VOH lZ0, Canada bDepartment of Food Science, University of British Columbia, 6650 North West Marine Drive, Vancouver, BC V6T 1X2, Canada ‘Sun-Rype Products Ltd, 1165 Ethel Street, Kelowna, BC VIY 2W4, Canada (Received

28 October 1996; accepted 18 November

‘signal’

ABSTRACT New

required

1996)

to detect

differences

between

samples.

Typically, a judge is asked if a sample is the same or different than a control and if they are sure or unsure of their judgement.

hedonic consumer evaluations, based on signal-detection

is calculated

After multiple

evaluations,

which is the probability

an R-index

of correctly

identi-

theory, were examined using three sensory panels [consumer ‘experienced taster’ (n = 36)) ‘proprietary’ (n=256),

fying the sample (signal) from the control they are presented in a paired difference

(n = 36) ] and four apple juices. For each panel, half the panellists rated and the other half ranked the apple juices. Scale and panel performance were evaluated by comparing Friedman’s rank-sum preference scores and by calculating R-indices. For the consumer panel, results from the rated and ranked anabses were similar, with juices having the

ony, 1988). Unlike other sensory measures it is independent of the response bias or criterion level for individual judges. Response bias is the magnitude of signal necessary before a judge will differentiate the ‘signal’ from the background ‘noise’ and decide that two samples are different. It is the cognitive aspect of differentiation and is not

same relative order of preference. Although the children as a were non-discriminating, the adults group (n= 78), (n = 178) sign$cantl_y d$%entiated the juices, disliking onejuice from the others. Small panel size as well as proprietary bias markedly a$cted the stability of the R-index. Results from ‘experienced’ and ‘proprietary’ panels were not effective indicators of consumer response. Di$erences in juice preferences, between the consumer and other panels were

magnitude estimation tions of liking (Pearce

Signal

detection

theory

to provide

(Green a measure

and

Swets,

1966)

of the sensory

the

sensory

system the

scales et al.,

of food pro-

have generally used and uni- or bipolar

to give numerical estima1986). More recently, signal

detection procedures have been modified for hedonic evaluation (Vie et al., 1991), allowing the calculation of an R-index that gives the probability of preferring a test product over the control rather than the degree of difference from the control. Vie et al. ( 199 1) used rating and ranking data collection procedures to calculate R-indices for potato chip preferences and to determine the likelihood-to-buy. Swanson and Lewis (1992) used ranked data to calculate an R-index indicating honey preferen-

rating

ces and a willingness-to-buy. A base size of at least 100 panellists has been suggested as a typical consumer panel to be used by market researchers with smaller panels used for preliminary preference testing (Basker, 1988). The relationships between the results from larger consumer panels and smaller proprietary (‘in-house’) panels of experienced tasters have

INTRODUCTION

developed

of

Hedonic or ‘liking’ measurements category scales (5, 7, or 9 points)

1997 Elsevier Science Ltd testing;

sensitivity

either rating or ranking data on a variety ducts (O’Mahony et al., 1979, 1983, 1985).

tional preference methodologies and was believed to be an important complement to the existing sensory methodologies. consumer

the

suming. Therefore, short-cut signal detection procedures were developed allowing R-indices to be calculated from

believed to monitor the same underlying variation. The Rindex, however, provided alternate information to conven-

testing;

to

underlying basis of difference testing, but implementation of signal detection procedures is rather time con-

scores for the consumer panel were highly correlated (r > 0.98)) for both ranking and rating data, and were

Keywords: preference scales; R-index.

related

(O’Mahony, 1992). Signal detection theory has been used to explain

attributed to proprietary bias, expectation error, and di$erences in sample size (II). R-indices and mean preference

Crown Copyright 0

(noise), when test (O’Mah-

was input

not been fully defined.

*To whom correspondence should be addressed. 241

According

to Meiselman

(1993)

242

M. A. Cli$et

al.

there has not been enough published research to determine how, or if, trained panels differ from consumer panels. Shepherd et al. (1988) suggested that consumer and trained panels agree in the direction of acceptability but not necessarily the magnitude. Horsfield and Taylor (1976) showed that meat acceptability, as measured by an independent panel of consumers, could be predicted from three components derived from a principal component analysis of sensory data collected from a trained eight member panel. Much of the research comparing results from different sized panels have used category or just-right scales to collect the hedonic measures. Little is known about the effect of panel size on the R-index hedonic measure. This research was intended to: a) examine the usefulness of the R-index in the hedonic assessment of apple juice, b) compare ranking and rating data collection procedures and c) assess the effect of panel size on Rindex estimates for consumer preference.

MATERIALSAND Apple

METHODS

juices

Four apple juices produced by different processes were used in the tastings. Two were commercially available juices, one produced from fresh apples (hereafter referred to as the control) and the other produced from apple concentrate (hereafter referred to as juice-from-concentrate) The other two were experimental apple juices (experimental juice A and B) prepared according to a novel process and then frozen in 1 1 plastic containers. Sensory bench screening indicated that the juices were different in colour (clear to golden), apple flavour (fresh to cooked) and perceived sweetness and acidity. These sensory differences were believed to be large enough to produce differences in consumer preference. Prior to sensory evaluation, the four juices were transferred into white plastic jugs and held at 5-7°C. Each juice was presented as a 25 ml sample in a 35 mL plastic cup labelled with a three digit random number. The samples were arranged in random order and served on white polyethylene trays. They were evaluated using one of two scorecards, rating or ranking, which were randomly assigned to the trays.

Sensory

booths, between lO:OO-15:00 on each of three days. After the evaluation, anecdotal comments were collected on the nature of the preferences. Panellists were split equally into two gender balanced groups for the ranking and rating evaluations. For the ranking evaluation, panellists tasted the four juices and ordered the juices from ‘least liked’ (left) to ‘most liked’ (right). For the rating evaluation, panellists tasted the juices according to a randomised design and checked the appropriate box according to their degree of liking: ‘dislike’, ‘neutral’, ‘moderately like’ or ‘like very much’. Both ranking and rating were in accord with those proposed by Vie et al. (1991) for hedonic R-index evaluations. ‘Experienced’ (71s 36) and ‘proprietary’ (n = 36) panels were conducted at Agriculture and Agri-Food Canada (Summerland BC) and Sun-Rype Ltd (Kelowna BC), respectively. Panellists (age 24-60) were gender balanced. The ‘experienced’ panel consisted of individuals who had previously participated in sensory evaluations but had no direct product knowledge. The ‘proprietary’ panellists consisted of individuals who were not directly involved with preparation of the experimental juices.

evaluation

The consumer evaluation consisted of 256 panellists (128 female, 128 male; 178 adults, 78 children). Children were between the ages of 9 and 15 years. Panellists were recruited from visitors to the University of British Columbia Open House who: 1) liked apple juice and 2) were interested in evaluating new and existing apple juices. Evaluations were conducted in portable taste panel

STATISTICAL

ANALYSIS

For the rated data, scores of 1 through 4 were assigned to the ‘dislike’, ‘neutral’, ‘moderately like’, and ‘like very much’ categories, respectively. For all consumer panellists (89 adults and 39 children), ‘experienced’ and ‘proprietary’ panels (n = 18), juice ratings were analyzed using a multiple comparison version of the Friedman test. Differences among the Friedman’s rank sums were compared to published critical values (Newell and MacFarlane, 1987; Basket-, 1988) or compared to critical values calculated according to Hollander and Wolfe (1973). For comparison of rating and ranking data, mean preference scores rather than Friedman’s rank sums were tabulated. R-indices were calculated using the control juice as the ‘noise’, using the formula outlined by O’Mahony ( 1986). For the consumer panel, mean preference scores were correlated with R-indices and simple linear regression was calculated. Scorecard usage was examined for the adults and children by examining frequency distributions for the preference categories. For the ranked data, scores of 1 through 4 were assigned from the ‘least liked’ to ‘most liked’ ranks of the juices. Data were analyzed as indicated above. To evaluate the data from both scales together, rated scores were converted to ranks. For each evaluation, the sum of the converted scores was adjusted to 10, the same as the sum of the ranked scores. When ties were encountered, numerical mid-points were assigned. Panel sizes for the consumer, ‘experienced’ and ‘proprietary’ panels

Preference Testing of Apple Juices were 178 (adults only), 36 and 36, respectively. were analyzed

using the Friedman’s

was least preferred

The data

and was differentiated

juices (x2,= 35.86, p=O.OOOl). For the consumer panel, R-indices

rank sum test.

RESULTS

an increased

probability

(0.57-0.63)

and

juice-from-concentrate control

Consumer panel

described

(Table

probability

The

as ‘watery,

statistical

no significant

analysis

differences

and females, therefore, analysis.

Results

(n= 128) children

(n = 39) compared

rating

when

the data was combined significant

were

bland,

and the control.

When

juices rated

( x*~ = 1.49, (x2,

by either and ranked

data

to

the the was

as ‘like a fresh apple’. This

preference

of as equivalent

test between

In a ‘hypothetical’ the control

to

the juice

paired-comparison and the experimental

‘Experienced taster’ and ‘proprietary’

=14.1,

ranking

compared

63 people would prefer the juice juice A (Rindex =0.63), A; whereas, 37 people would prefer the control.

~~0.026)

(n=89)

juices

of liking

and dilute’ and the experi-

can be thought

test (n= 100) between

1). As a group,

non-discriminating

(probability)

a paired-comparison

for further

(x*,=9.23,

to the adults

evaluating

procedures.

combined

R-index

found

the responses of males

to the rated analyses (Table

p=O.68) p=O.O03)

between

not shown)

from the ranked analysis for consumers

were highly

and identical

(data

showed

juice-from-concentrate

mental juice A was described Preliminary

calculations

(0.4 1) when

2).

from all other

of liking the experimental

a decreased

243

panels

or Using the rating scale, neither

were

or the ‘proprietary’

(n = 178 = 89 + 89), the juice-from-concentrate

the ‘experienced’

(n= 18) panels differentiated

(n= 18) among

1. Mean preference scores for apple juices using rating and ranking scales, for the consumer, ‘experienced’ and ‘proprietary’ sensory panel”

TABLE

Consruner Juice

All

Rati&

Experimental Juice A Experimental Juice B control juice-from-concentrate RMLing” Experimental Juice A Experimental Juice B control juice-from-concentrate Bothd Experimental Juice A Experimental Juice B control juice-from-concentrate

Adults

panel

(agezl5)

(n= 128)

(n=89)

2.9a 2.9a 2.7ab 2.4b (n= 128) 2.6a 2.6a 2.5ab 2.2b

3.0a 2.9a 2.7ab 2.3b (n=89) 2.8a 2.6a 2.4ab 2.lb (n= 178) 2.7a 2.6a 2.4ab 2.lb

children (lO
(n=39) 2.8a 2.8a 2.8a 2.5a (n=39) 2.4a 2.4a 2.7a 2.4a

“statistical analysis were conducted on rank-sums. For each panel and data collection method, different at ~50.05. bcategory rating scale: 1= dislike, 2 = neutral, 3 = moderately like, 4 = like very much. “ranking scale: 1 = least liked, 4 = most liked. drated analyses were converted to ranked analysis.

TABLE

2. R-indices” ranking scales

from the consumer

(adults only), ‘experienced’

and ‘proprietary’

Rated (n=89)

Experimental Juice A Experimental Juice B control juice-from-concentrate “R-indices

reflect the probability

0.63 0.58 control 0.41 of preference

Proprietary panel

(n= 18) 2.7a 2.4a 3.0a 2.3a (n= 18) 2.0ab 3.3a 2.4ab 2.0b (n=36) 2.4ab 2.8a 2.7ab 2.lb

(n= 18) 2.4a 3.0a 2.8a 2.0a (n= 18) 2.2bc 2.8ab 3.3a 1.5bc (n=36) 2.2bc 2.9ab 3.la 1.7bc

means within a column are significantly

panels for four apple juices for the rating and

Proprietary

Experimccd

Consumer Juice

Experienced panel

Ranked (n=SJ)

Rated (n= 18)

Ranked (n= 18)

Rated (n= 18)

Ranked (n = 18)

0.60 0.57 control 0.41

0.44 0.33 control 0.19

0.49 0.74 control 0.38

0.34 0.54 control 0.29

0.17 0.39 control 0.11

compared

to the control.

244

M. A. Cl$et

al. the ‘noise’ (Vie et al., 1991) to which all comparisons are made. However in this study, a proprietary commercial juice was selected as the ‘noise’ before data collection so that panel size comparisons could be made using the same control. Unlike the adults, children as a group showed no discrimination among the juices. This was attributed to greater variation in individual preference for children rather than a lack of juice preference or difficulties in scoring. Similar score distributions for both adults and children (Fig. l), suggested that the children used the scorecard in much the same way as the adults to differentiate among the juices. This was consistent with Kimme1 et al. ( 1994) who found that children over the age of four can reliably and reproducibly use preference ranking or hedonic rating evaluation scales. For the consumer panel, the rating and ranking data collection procedures showed the same relative juice preferences with almost identical R-indices. Anecdotal observations during data collection suggested that ranked analysis required greater consumer attention and memory. For the smaller ‘experienced’ and ‘proprietary’ panels, rating data collection showed no differences in juice preference while ranking data collection showed significant differences (J?< 0.05). This was believed to be due partly to panel-to-panel variation which was more evident with the smaller panels. As well, it was speculated the forced choice associated with ranking may have created an ‘artificial’ spread for the scores resulting in significant differences for the smaller sample size but not the larger consumer sample.

the juices (x*~= 7.08, x*r=9.22), although there was a tendency for the juice-from-concentrate to be least liked (Table 1). Results from the ranked data collection showed a significant difference in juice preference (‘experienced’ x *,.= 11.07, ~~0.01 1 and ‘proprietary’ x*~= 19.6, p=O.OOOl). While the juice-from-concentrate was least liked by both panels, the ‘proprietary’ panel preferred the control juice, while the ‘experienced taster’ panel preferred juice B. When the rated and ranked data were combined, results were identical to the ranked analyses for the ‘experienced’ (x*~ = 11.17, p = 0.0 11) and ‘proprietary’ (x*~ = 34.4, p = 0.000 1) panels (Table 1). R-indices for the rated and ranked data were for the most part less than 0.50 with the exception of experimental juice B. Correlations between the R-indices and mean scores for the consumer evaluations were highly significant (p
DISCUSSION Consumer The R-indices provided useful information regarding consumer preferences (Table 2) among the apple juices. R-indices less than 0.5 are seldom reported in the literature since the least liked sample is arbitrarily assigned as 1.0 -

0.0

-1 1.6

I”,

I

1.8

2.0

“I,

I

2.2



1

2.4

I”’

I

2.6





I

2.8





I

3.0

-1

I

3.2

MEAN PREFERENCE SCORE FIG. 1. Correlation between R-indices and mean scores for apple juices (consumer panel) for the ranked and rated data, and data from Vie et al. (1991).

Preference

According to O’Mahony ( 1986), the R-index for ranked data would be expected to be higher than that from rated data, due to the forced spreading of scores over the entire matrix. This has been documented by Vie et al. (1991) and Ishii et al. (1992). In this research, the R-indices for the consumer panel were similar for both rating and ranking (Table 2), however, the R-indices were higher for ranked data for a given mean score (Fig. 2). The R-indices varied more with panel size (Table 2) than with rating or ranking data collection methods. Contrary to research which has suggested that trained panels may indicate a preliminary direction of consumer response (Shepherd et al., 1988; Basker, 1988), our work showed that smaller sample sizes were not good predictors of consumer preferences. For the smaller panels, there were differences in relative juice preferences between the two methods of data collection as well as differences between the ‘proprietary’ panel who had some product knowledge and the ‘experienced’ panel who were unfamiliar with the juices. The effect of product knowledge was especially evident in the ranking Rindices for the ‘proprietary’ panel. The juice that looked and tasted most like the control had a greater probability of being liked by the proprietary panel. This was consistent with anecdotal comments suggesting that panellists recognized and favoured their companies’ product. The consumer panel (conducted in a city with a diverse population) tended to prefer the experimental juices, while the other panels (conducted in a more rural orchard area) tended to prefer the control juice. Vie et al. (199 1) clearly indicated that R-indices and scaled hedonic scores are not ‘equivalent values’. The Rindex provides a probability of sample preference from a control and the hedonic score provides an estimate of the magnitude of the degree of sample preference. While the

JO-

Testing of Apple Juices

245

two methods were reported and interpreted very differently, the results from the two analysis were highly correlated (r= 0.99, n=8) (calculated from Vie et al., 1991). This is consistent not only with correlations reported here for the rated (r= 0.98, n=4) and ranked (Y= 1.O, n=4) data but also with correlations calculated for honey (Swanson and Lewis, 1991/1992) (r=--0.99, n=5). Correlations from this particular research were negative since the most preferred sample was given the lowest (1) rather than the highest rank score. These extremely high correlations imply that the two measures are indeed assessing the same underlying variation and that the preference R-index expresses this variation in an alternate way which may be more meaningful to market researchers. The R-index gives the probability that a particular product will be preferred to a control product, however without replication, there is no measure of significance. For example, rating ‘experienced’ and ‘proprietary’ panels showed no significant differences in mean preference scores, however, the R-indexes ranged from 0.19 to 0.44 and 0.29 to 0.54, respectively. Since this numerical spread is often associated with statistical significance, it is important that the R-index be interpreted within the context of sample size and data collection method.

CONCLUSION

This research showed that the new hedonic sensory methodology was successful in differentiating consumer preferences among apple juices. For consumer panels it provided consistent information with conventional sensory methodologies (mean scores) from either rating or ranking scorecards. The R-index procedure appeared to be less useful with smaller panel sizes, particularly when there was proprietary knowledge of the product. Thus for these juices, ‘experienced’ and ‘proprietary’ panels could not be used as preliminary indicators of consumer preference. High correlations between the R-index and mean scores suggested that the two methodologies were monitoring the same underlying variation. Because the R-index is a probability rather than a relative preference score, it provides the food researcher with an alternate interpretation for market research.

ACKNOWLEDGEMENTS adult 1-I-1

child

ad”,, b-24

child

adult

child

adult k4-1

)-3A

child ScotlE

FIG. 2. Frequency of category use with the 4-point rating scale for children

(lO
and adults

(ager15).

The authors would like to thank J. Hall for the calculation of critical values for the Friedman analysis and L. Fukumoto and B. Girard for supplying the experimental juices.

246

M. A. Cltyet

al.

REFERENCES Basker, D. (1988) Critical values of differences among rank sums for multiple comparisons. 3ournolof Food Technology 42(2),

79-84.

Green, D. M. and Swets, J. A. (1966) Signal Detection Theory and Psychophysics. John Wiley, New York. Hollander, M. and Wolfe, D. A. (1973) Nonparametric Statistical Methods. p. 151. Wiley, New York. Horsfield, S. and Taylor, L. J. (1976) Exploring the relationship between sensory data and acceptability of meat. Journal of Science, Food and Agrkxlture 27, 1044

1056.

Ishii, R., Vie, A. and O’Mahony, M. (1992) Sensory difference testing: Ranking R-indices are greater than rating R-indices. 3ournal Sensory Studies 7, 57-6 1. Kimmel, S. A., Sigman-Grant, M. and Guinard, J.-X. (1994) Sensory testing with young children. Journal of Food Technology 48( 3)) 92-99. Meiselman, H. L. (1993) Critical evaluation of sensory techniques. Food Qua@ and Preference 4, 33-40. Newell, G. J. and MacFarlane, J. D. (1987) Expanded tables for multiple comparison procedures in the analysis of ranked data. Journal of Food Science 52, 1721-l 725. O’Mahony, M., Kulp, J. and Wheeler, L. (1979) Sensory detection of off-flavors in milk incorporating short-cut signal detection measures. Journal of Dairy Science 62, 1857-1864. O’Mahony, M., Buteau, L., Klapman-Baker, K., Stavros, I.,

Alford, A., Leonard, S. J., Heil, J. R. and Wolcott, T. K. Sensory evaluation of high vacuum flame sterilized clingstone peaches, using ranking and signal detection measures with minimal cross-sensory interference. Journal of Food Science 48, 1626-1631. O’Mahony, M., Wong, S. Y. and Obbert, N. (1985) Sensory evaluation of navel oranges treated with low does of gammaradiation. Journal of Food Science 50, 639646. O’Mahony, M. (1986). Sensory Evaluation of Food: Statistical Methoa!s and Procedures. Marcel Dekker, New York. O’Mahony, M. (1988). Sensory difference and preference testing: the use of signal detection measures. In Applied Sensory Analysis of Foods, ed. H. Moskowitz; pp. 145-175; CRC Press inc., Boca Raton, Florida.. O’Mahony, M. (1992) Understanding discrimination tests: A user-friendly treatment of response bias, rating and ranking r-index tests and their relationship to signal detection. Journal Sensory Studies 7, l-47. Pearce, J. H., Korth, B. and Warren, C. B. (1986) Evaluation of three scaling methods for hedonics. Journal Sensory Studies 1, 2746.

Shepherd, R., Griffiths, N. M. and Smith, K. (1988) The relationship between consumer preferences and trained panel responses. Journal Sensory Studies 3, 19-35. Swanson, R. B. and Lewis, C. E. (1992) Premium honeys: Response of sensory panellists. Food Quality and Preference, 3, 215-221.

Vie, A., Gulli, D. and O’Mahony, M. (1991) Alternative hedonic measures. Journal Food Science !X( l), l-46.