Food Quality and Preference xxx (2013) xxx–xxx
Contents lists available at SciVerse ScienceDirect
Food Quality and Preference journal homepage: www.elsevier.com/locate/foodqual
An alternative way to uncover drivers of coffee liking: Preference mapping based on consumers’ preference ranking and open comments Paula Varela ⇑, Julián Beltrán, Susana Fiszman Instituto de Agroquímica y Tecnología de Alimentos, Agustín Escardino 7, 46980 Paterna, Valencia, Spain
a r t i c l e
i n f o
Article history: Received 4 December 2012 Received in revised form 6 March 2013 Accepted 6 March 2013 Available online xxxx Keywords: Coffee Preference mapping Open comments 9-Point hedonic scale Preference ranking R-index
a b s t r a c t In classic consumer science, liking has generally been measured with the 9-point hedonic scale. In recent years, signal detection procedures where consumers rank products in terms of preference have been used, together with an R-index that measures the distance in preference. Ranking has been found to be friendlier for consumers, being a more ‘‘natural’’ exercise than scaling. However, scaling has the advantage of quantifying liking, resulting in data sets that can be treated further, for example through preference mapping, together with sensory data from a trained panel or from consumers. Preference mapping is very useful for product development and as a communication tool. This study compared two preference mapping approaches, one using a data set from hedonic scaling plus intensity questions and the other using preference ranking data coupled with open comments. Preference ranking tests plus open comments by consumers proved a very promising method as it produced very similar internal preference map results to ‘‘traditional’’ preference mapping from liking scales. This quicker and easier method in terms of practical implementation has the added advantage of eliciting drivers of liking and disliking directly from consumers, as these cannot be obtained through attribute intensity assessment or by using a trained panel. Ó 2013 Elsevier Ltd. All rights reserved.
1. Introduction The increasing quantity and variety of food products appearing on the market is making consumers more demanding about their purchases (Clemons, 2008). It is not enough to find out how much consumers like a product, their opinions and the variety of their needs must be carefully studied (Chrea et al., 2011; Onwezen et al., 2012); there are no universally liked odors or tastes, one person may dislike what another person likes (Moskowitz & Bernstein, 2000). Consumers may present differentiated preference patterns for some products because of their different hedonic responses, forming groups with shared hedonic patterns. This is known as consumer segmentation. In some foods, considerable variations in taste, intensity of flavor or sensory profile can lead to a segmentation of consumers. These variations can be intrinsic to the product, such as sharp, crunchy apples versus sweet, mealy ones (Jaeger, Andani, Wakeling, & MacFie, 1998) or may be due to changes in formulation that modify the sensory properties, as in milk desserts that vary in flavor and texture (Ares, Giménez, Barreiro, & Gámbaro 2010) or low and high intensity chocolates (Januszewska & Viaene, 2001).
⇑ Corresponding author. Tel.: +34 963 900 022; fax: +34 963 636 301.
Coffee is a product that can be drunk on its own or mixed with others (milk, sugar, condensed milk, etc.), making it a typical segmented product for which groups with well differentiated consumption patterns can be identified. Bitterness is generally considered a negative attribute in food, yet many individuals enjoy a certain amount of bitterness in products such as coffee, beer, or dark chocolate (Harwood, Ziegler, & Hayes, 2012). Cristovam, Russell, Paterson, and Reid (2000) found differences between men’s and women’s preferences for six blends of different coffee bean varieties in cappucino coffees. Internal or external preference mapping approaches can be applied to understanding these kinds of consumer preference pattern. Preference mapping is a group of methods for investigating consumers’ hedonic responses to a set of products through multivariate statistical mapping methods (Næs, Brockhoff, & Tomic 2010). In internal preference mapping the sensory profile of the products is related to liking ratings from a representative sample of consumers, using only consumer data to determine consumer preference patterns, and to build a map representing the preference space. Afterwards, the sensory description is linked by regressing it onto the consumer map (Ares, Varela, Rado, & Giménez 2011). Internal preference mapping has been identified as advantageous for marketing actionability and new product creativity as the preference space is created on the basis of consumers’ responses alone (MacFie, 2007; van Kleef, van Trijp, & Luning 2006).
E-mail address:
[email protected] (P. Varela). 0950-3293/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.foodqual.2013.03.004
Please cite this article in press as: Varela, P., et al. An alternative way to uncover drivers of coffee liking: Preference mapping based on consumers’ preference ranking and open comments. Food Quality and Preference (2013), http://dx.doi.org/10.1016/j.foodqual.2013.03.004
2
P. Varela et al. / Food Quality and Preference xxx (2013) xxx–xxx
This technique reveals the main factors underlying the consumers’ liking, providing the main preference directions that separate consumers with different liking patterns (consumer segments) and linking them to product characteristics (Ares, Gimenez, & Gambaro, 2006; Lê & Ledauphin, 2006; Parente, Manzoni, & Ares, 2011; Thompson, Drake, Lopetcharat, & Yates, 2004). The most common way to collect data for preference maps is to measure consumer liking through acceptance tests, using 9-point hedonic scales and presenting the samples in a monadic sequence. This type of test makes it possible to apply parametric statistical analysis and directly compare data across studies. Although these scales make it possible to assess the degree of liking quantitatively, they are not very intuitive for consumers, as although the values on the scale are the same, the consumers may not all have the same values when they work out a scoring pattern. This will vary internally with the previous experience of each panelist, who will intuitively tend to compare samples, and the distance between the points will respond to each consumer’s ‘‘internal scale’’ (Lawless & Heyman, 2010). The multidimensional representation of products and consumers in an internal preference map is generally obtained via PCA of a matrix of products x consumers, the data being the hedonic score derived of the scaling exercise. The obtained map allows the visualization of the samples that received the highest hedonic scores together with the consumers that preferred them: the vectors indicate liking directions for each consumer. Afterwards, the sensory description is linked by regressing it onto the consumer map (Varela, in press). Another option for building a preference map is to use a different preference test, as a preference ranking to measure hedonic appreciation. These tests present all the samples together and the consumers rank them by preference. The data obtained are ordinal. The disadvantage of this is that the preference ranking of the samples is not comparable from one experiment to another. The advantage is that they are simpler and more natural for consumers (Hein, Jaeger, Carr, & Delahunty, 2008; Lawless & Heymann, 2010). Conceptually, ranking samples by preference has been recognized as a very simple task (Meilgaard, Civille, & Carr, 2006). However, some authors have identified it as a complicated one if there are many samples and many parameters to be assessed, as this leads to tiredness and to not rating the last samples in the same way as the first ones (Moskowitz, 2005). Apart from being more natural and in general easier for consumers, another advantage of the ranking procedure is from a practical point of view, the shorter time of the test; setting up a ranking procedure is definitely shorter and easier than rating the same number of samples, where only one tray with all the samples is given to the consumers, instead of having to follow a design and changing trays between tastings. The limitation of this technique is that it gives no indication of the size of the differences. This can be solved in part by analyzing the data with the R-index, which explains the probability of choosing one sample rather than another (Brown, 1974; Hye Seong & van Hout, 2009). Ordinal data could be analyzed via multifactorial analysis (MFA) to obtain a preference map. (MFA) permits analyzing several tables of variables, obtaining maps that allow studying the relationship between the observations, the variables and tables, which can be of different types (Escofier & Pagès, 1984). Generally, for preference mapping purposes, consumers are only asked about their liking for the products and the description of these is obtained with a trained panel (Parente et al., 2011). However, some authors have realized that confining the consumers to assessing acceptibility without allowing them to describe and express what they feel about the product is a waste of information that could mean not having to use a trained panel. One of the options most often used for product description by consumers is to measure the intensity of attributes, defined by using structured
or unstructured scales labeled from ‘‘low’’ to ‘‘high’’. This method has been much criticized. Kim and O’Mahony (1998) and Meilgaard et al. (2006) have described various scaling effect and lack of discrimination issues. Other ways to measure consumer perception include that suggested by Parente et al. (2011), who proposed building a preference map based on consumers’ responses to a check-all-that-apply (CATA) question on commercial antiaging cosmetic creams. Ten Kleij and Musters (2003) proposed analyzing open-ended questions to complement preference mapping in a study of mayonnaises, asking the consumers to volunteer words about the sensations the product aroused in them. Ares et al. (2010) conducted a similar study with milk-based desserts. However, none of these studies asked whether the terms employed were related to liking or disliking, thus losing relevant information that can give important clues to buying decisions. Symoneaux, Galmarini, and Mehinagic (2012) conducted a study in which the untrained consumer panelists, after rating apples on a 7-point hedonic scale, were allowed to write words freely to indicate their ‘‘likes’’ or ‘‘dislikes’’. Further discussion on the use of open ended questions or word association techniques for product profiling purposes could be found in Varela and Ares (2012). The objectives of the present work were to study consumers’ liking patterns for a segmented product (coffee, varying the intensity of the coffee in the samples by adding different amounts of milk and sugar) through internal preference mapping, comparing two methods. The study proposes a quick, simple approach, collecting data through preference ranking coupled with open comments, compared with a more classic approach using a 9-point hedonic scale coupled with intensity questions. 2. Materials and methods 2.1. Sample preparation Six samples were prepared with different proportions of instant coffee (NescaféÒ, Nestlé España S.A., Barcelona, Spain), white sugar (Azucarera Ebro S.L., Valladolid, Spain), whole milk (UHT CremositaÒ, Leite Rio S.L., Lugo, Spain) and tap water (Table 1). The quantity of instant coffee remained constant at 3.5 g (2 spoonfulls) per 0.20 L cup, the quantity recommended by the manufacturer. The quantities varied were those of the sugar, water and milk. The concentrations of the six samples finally used were chosen so that they would be quite different from each other, covering a wide range from ‘‘little milky flavor and not very sweet’’ to ‘‘strong milky flavor and very sweet’’. Samples were selected in a bench top tasting between the researchers and members of a descriptive panel (not trained in coffee evaluation, but with sensory training), in order to get samples different enough ranging in coffee intensity, milky/creaminess and sweetness. A consumption temperature of around 60 °C was chosen (Lee & O’Mahony, 2002). Thermos flasks for 1.9 L of liquids (Valira, Reus, Spain) were used to keep the samples at this temperature until the test. Water was heated with an electric kettle and milk in a microwave oven. Previous to the consumer test the preparation was optimized in order to obtain the final sample mix at 60 °C
Table 1 Sample formulation. Sample
Soluble coffee (g/L)
Sugar (g/L)
Water: whole milk
A B C D E F
14 14 14 14 14 14
0 24 24 64 24 64
10:1 10:1 2:1 2:1 1:2 1:2
Please cite this article in press as: Varela, P., et al. An alternative way to uncover drivers of coffee liking: Preference mapping based on consumers’ preference ranking and open comments. Food Quality and Preference (2013), http://dx.doi.org/10.1016/j.foodqual.2013.03.004
P. Varela et al. / Food Quality and Preference xxx (2013) xxx–xxx
(different heating times depending on the amount of milk). Flasks were pre-heated with hot water. Temperature variation between flasks and the maximum time they kept the temperature constant were determined in a laboratory test previous to the consumer test. It was verified that samples could be kept in the flasks up to 2 h, however, every 1 h a new batch was prepared. The consumer tests lasted about 4 h, so four batches of coffee samples were prepared for each session. 2.2. Consumer study All of the 161 consumers who took part in the study were students or staff from the Instituto de Agroquímica y Tecnología de Alimentos (Paterna, Spain) who consume coffee with milk. The six samples were prepared and kept in thermos flasks for 1 h. The samples (25 mL) were served in expanded polystyrene beakers with plastic lids to maintain the temperature throughout the evaluation session. Still mineral water at ambient temperature was used to rinse the mouth. The samples were presented to the consumers in random order, labeled with randomly-generated three-digit codes. Sensory testing was carried out in the sensory laboratory of the Institute of Agrochemistry and Food Technology, equipped with individual booths and built following ISO norms (ISO 8589, 1988). All sensory testing in our institution is ethically approved beforehand. 2.2.1. Test 1–9-point hedonic scales and intensity scales Of the 65 consumers who took part in this test (all aged between 21 and 66), 22 were men and 43 women. The samples were presented monadically, labeled with 3-digit random codes, in a balanced rotated sequence (Williams’ presentation design). For each sample, the consumers rated their overall liking on 9-box hedonic scales (from ‘‘1-dislike extremely’’ to ‘‘9-like extremely’’), and evaluated the intensity of the following seven attibutes on 9-box intensity scales (from ‘‘1-low’’ to ‘‘9-high’’): color, coffee aroma, coffee flavor, milky flavor, sweetness, bitterness and body. Attributes were taken from previous consumers test on similar samples, and are of common use in the beverage industry for consumer tests. Data acquisition was carried out with Compusense five release 5.0 software (Compusense Inc., Guelph, Ont., Canada). 2.2.2. Test 2 – preference ranking and open comments The 96 consumers taking part in this test were 31 men and 65 women, from 20 to 67 years of age. They received all 6 samples at the same time and were allowed to taste them as many times as they needed. The test was self administered using paper ballots. The consumers first had to rank the six samples in order of preference, from most liked to least liked, ties allowed. They were also asked to write as many words or phrases as they needed to explain why they liked some of the samples more, or less. The open comments were not linked to any sample in particular, but were intended rather as ‘‘general associations’’ to likes or dislikes. For this part of the test, two rectangular areas were allocated at the two ends of the space for writing in the codes for the preference ranking. It was explained to the consumers that they did not have to describe any sample in particular but rather to give a general explanation. Nevertheless, if they wanted to say something about a particular sample in the open comment space they could, making reference to the codes. 2.3. Data analysis 2.3.1. 9-Point hedonic scales and intensity scales Analysis of variance (ANOVA) was performed on the consumers’ overall liking scores, considering consumer and sample as sources
3
of variation. Mean ratings were calculated and significant differences were checked using Fisher’s LSD test (p < 0.05). The consumer assessment results were analyzed with SPSS software (SPSS for Windows release 13.0, Copyright SPSS Inc.). 2.3.1.1. Preference mapping – principal nalysis. Internal preference mapping was carried out using principal component analysis (PCA) of the correlation matrix of individual consumer liking data, using XLSTAT (XLSTAT version 2008.6.8, Addinsoft). To relate the preference patterns to the drivers, the intensities of the attributes measured were mapped as supplementary variables. 2.3.2. Preference ranking and open comments R-index: The order in which each consumer ranked the samples was assigned scores from one for least-liked to six for the mostliked sample. When consumers indicated an equal liking for more than one sample, an intermediate score was assigned (for instance, if two samples tied for the top liking slot they were each assigned a score of 5.5). The R-index was calculated from the sum of the ranking scores for each sample, using the response matrix method (O’Mahony, 1992) with the Excel spreadsheet program (MicrosoftÒ Office 2003). The terms associated with like or dislike that the consumers had written in the open comments boxes were separated and analyzed independently. The connectors were removed and the synonyms grouped together. The grouping procedure was performed independently by the three researchers who authored this study, considering personal interpretation of the meaning of the words and word synonymy as determined by a Spanish dictionary. After individually evaluating the data, a meeting of the researchers was undertaken in order to check the agreement between their classifications. The frequency of each term for each sample was then counted. Terms with less than 1% of mentions were considered to appear with a low frequency and were not included in the study. 2.3.3. Preference mapping – MFA Internal preference mapping of the preference data was carried out using multiple factor analysis (MFA) with XLSTAT software in order to study the relations between the consumers’ comments and their preferences for the samples. The data were placed in order in three tables: sample preferences by consumer, frequency of like comments by sample and frequency of dislike comments by sample. The consumers’ preferences were taken as the main active variable and the frequency of like comments and frequency of dislike comments as supplementary variables. 2.3.4. Cluster analysis and comparison of MFA and PCA clusters Hierarchical cluster analysis (HCA) was performed to identify groups of consumers with different preference patterns for each study. This analysis was performed on standardized liking scores, using Euclidean distances and Ward’s method as the agglomeration criterion. The decision to adopt a three-cluster solution was taken on observing the liking or preference scores for the three groups in order to find meaningful liking patterns that discriminated between groups. To confirm that the clusters differed from each other, the mean liking or mean preference per sample was calculated, a one-way ANOVA was carried out and the Friedman test for preference ranking and Fisher’s LSD for liking ratings were applied as post hoc tests. After obtaining the clusters the individual consumers belonging to the different groups were identified and they were represented with different symbols on the already plotted preference maps, for better graphical visualization of the segmentation.
Please cite this article in press as: Varela, P., et al. An alternative way to uncover drivers of coffee liking: Preference mapping based on consumers’ preference ranking and open comments. Food Quality and Preference (2013), http://dx.doi.org/10.1016/j.foodqual.2013.03.004
4
P. Varela et al. / Food Quality and Preference xxx (2013) xxx–xxx
The ‘‘body’’ attribute presented fewer statistically significant inter-sample differences but proved a conceptually complex parameter, as it was interpreted positively in some cases as ‘‘intense flavor’’ and negatively in others as ‘‘watery’’.
3. Results 3.1. Overall liking and attribute intensity The ANOVA showed that there were significant differences between samples, both in overall liking and in the intensities of the different attributes assessed (Table 2). The samples were perceived as quite different in the intensity of all the attributes, showing that those selected were representative of a quite wide flavor space that provoked different reactions in consumers. Sample A was the least liked, followed by sample B (Table 2). These coffees were prepared with little or no sugar and with little milk (Table 1). The consumers gave these two samples medium to high intensity ratings for color, coffee aroma, coffee flavor and bitterness and low intensity ratings for milky flavor, sweetness and body. Sample D was the best-liked overall. This sample was prepared with a large quantity of sugar and a medium milk content (Table 1). The consumer assessment of attribute intensity scored coffee aroma, coffee flavor, milky flavor and body as having intermediate intensity, sweetness as high intensity and bitterness as low intensity compared with the other samples. This sample was preferred to others with a greater quantity of milk (E and F). Interestingly, the consumers noticed a more intense milky flavor in sample D than in C, though both had the same quantity of milk but D contained more sugar. In the same way, sample E was perceived as significantly sweeter than C, though both contained the same amount of sugar but sample E was prepared with more milk. This shows that in fact, even when attributes are to be rated individually, panelists with no previous training unconsciously assess a set of factors globally (Lawless & Heymann, 2010). The different ingredients also interacted in the flavor profile, as well as providing different textures that influenced overall perception. Samples C, E and F did not present significant differences in overall liking (Table 2).
Table 2 Mean liking for the samples evaluated (hedonic scale 1–9) and assessment of attribute intensity (intensity scale 1: low to 9: high). Attribute
Sample
Overall liking Color intensity Coffee aroma intensity Coffee flavor intensity Milky flavor intensity Sweetness intensity Bitterness intensity Body intensity
A
B
C
D
E
F
2.3a 7.4c 5.3e 6.2c 2.0a 1.7a 7.3e 3.0a
3.1b 7.0c 5.0d 5.9c 2.2a 2.8b 6.0d 3.5ab
4.1c 5.2b 4.4cd 5.1b 4.4b 3.5c 5.0c 4.1bc
5.3d 5.2b 4.1bc 4.7b 5.2c 6.4e 3.4a 4.7cd
3.8c 3.3a 3.5ab 3.6a 6.0d 4.3d 4.1b 4.4cde
4.3c 3.4a 3.4a 3.5a 6.0d 6.7e 2.8a 4.4cde
Identical letters indicate no significant difference according to Fisher’s LSD test.
3.2. Preference ranking and open comments The R-index showed the probability of one coffee being chosen rather than another (Table 3). Table 4 shows the terms given in the comments for each sample and the percentage of mentions for each term within the total for that category (like or dislike). Comments that appeared with a lower frequency than 1% were excluded from the analysis. The like comments excluded were dark color, natural flavor, typical, appearance, creamy flavor, texture, comfortable, winter, breakfast, and delicate flavor. The dislike comments excluded were dark color, artificial flavor, toasted flavor, artificial aroma, ‘‘I don not drink coffee like this’’, untypical, creamy flavor, intense color, artificial texture, soluble coffee, sour, dilluted coffee, unpleasant, milky aroma, strange color, sweetener flavor, gritty and fatty. The probability of sample A being preferred to any of the others was always below 30% (column A) and it was the coffee with the lowest preference percentage, it was not preferred to any other (Table 3). This coffee was prepared with no sugar and little milk. Most of the consumers’ comments on it were negative and it was the sample that attracted the greatest number of negative comments, particularly ‘‘poor body’’, ‘‘bitter’’ and ‘‘little sugar’’. However, this least-preferred sample also attracted some ‘‘like’’ comments, such as ‘‘coffee flavor’’ and ‘‘less sweet’’, showing that there were consumers who liked it. The only sample that B was preferred to was A (R-index 70.8%). The sole difference between the two was that B was prepared with slightly more sugar, so on comparing these two samples, the consumers opted for the coffee that they perceived as being sweeter. As with sample A, ‘‘dislike’’ comments such as ‘‘poor body’’ and ‘‘bitter’’ were predominant but there were also some ‘‘like’’ comments such as ‘‘coffee flavor’’ and ‘‘less sweet’’. At the opposite extreme, sample D was preferred to all the others (all the preference percentages lay between 54% and 89.9%). This coffee was prepared with more sugar than A or B and an intermediate level of milk. This was the sample with the greatest number of ‘‘like’’ comments, particularly ‘‘right sweetness’’, ‘‘foamy’’, ‘‘coffee flavor’’ and ‘‘milk flavor’’. The ‘‘dislike’’ comment was ‘‘too sweet’’. Second only to D in the preference ranking was F. This coffee was prepared with an intermediate level of sugar and a high level of milk. D and F contained the same amount of sugar but different amounts of milk. In their preferences for these two coffees, the consumers showed that for the same level of sweetness they preferred the sample with a less milky flavor. D and F were the samples with the highest probability of being preferred and were the
Table 3 R-index calculated from the preference ranking for each sample. The percentage indicates the probability of choosing that sample rather than the others. Sample
vs
R-index (%)
Sample
vs
R-index (%)
Sample
vs
R-index (%)
A
B C D E F
29.2 15.1 10.2 18.8 13.4
C
A B D E F
84.9 71.5 26.3 48.8 33.4
E
A B C D F
81.2 68.8 51.2 29.4 35.4
B
A C D E F
70.8 28.5 16.9 31.2 22.1
D
A B C E F
89.8 83.1 73.7 70.6 54.1
F
A B C D E
86.6 77.9 66.6 45.9 64.6
Please cite this article in press as: Varela, P., et al. An alternative way to uncover drivers of coffee liking: Preference mapping based on consumers’ preference ranking and open comments. Food Quality and Preference (2013), http://dx.doi.org/10.1016/j.foodqual.2013.03.004
P. Varela et al. / Food Quality and Preference xxx (2013) xxx–xxx Table 4 Like and dislike comments mentioned by the consumers for each sample. Number of mentions, proportion of each (%) and total mentions per sample. Comments
Coffee sample B
C
D
E
F
%a
1 3 0 1 0 0 0 1 0 3 1 0 0 0 0 1 0 11
2 4 0 2 1 1 1 1 0 5 2 1 0 0 0 1 0 21
13 6 3 3 4 2 4 3 0 2 0 2 0 1 1 0 1 45
37 10 13 9 9 8 5 3 11 0 2 2 2 2 2 2 2 119
13 7 3 6 4 5 4 5 0 0 1 0 1 1 2 0 1 53
35 6 14 11 12 8 4 0 1 0 2 1 3 2 1 1 0 101
26.9 9.6 8.8 8.5 8.0 6.4 4.8 3.5 3.2 2.7 2.1 1.6 1.6 1.6 1.6 1.3 1.1 93.3
33 38 13 9 0 7 8 5 1 0 5 1 1 1 1 123
24 20 9 6 0 6 5 3 1 0 4 2 1 0 1 82
3 2 2 2 0 1 1 2 0 1 0 0 0 0 1 15
0 1 0 0 7 0 0 0 0 1 0 0 1 0 0 10
4 1 6 2 1 0 0 3 4 6 0 1 0 2 0 30
1 0 0 0 11 0 0 0 5 3 0 0 1 0 0 21
20.8 19.8 9.6 6.1 6.1 4.5 4.5 4.2 3.5 3.5 2.9 1.3 1.3 1.0 1.0 90.1
A Like comments Right sweetness Coffee flavor Milky flavor Soft flavor Good flavor Well-balanced Strong body Right bitterness Foamy Less sweet Coffee aroma Pleasing Light color Not bitter Good aroma Intense The way I drink it Total Dislike comments Poor body Bitter Little sugar Flavorless Too sweet Too much coffee flavor Strong flavor Does not taste like coffee Weird or strange flavor Milky flavor Little milk Smelly Irritating Burnt flavor Acidic Total
a Percentage of mentions in relation to the total number of like or dislike comments respectively.
ones to receive the most ‘‘like’’ comments, the main ones being ‘‘right sweetness’’ and ‘‘milky flavor’’. Samples C and E showed intermediate preference levels. The Rindex for the C–E pair was close to 50%, although there was a slight preference for coffee E rather than C (E vs C = 51.2%). Both were prepared with an intermediate quantity of sugar, but coffee E contained a greater quantity of milk. Comparison between pairs D–F (more sugar) and C–E (intermediate sugar level) showed that preferences for the samples were determined by sweetness, as D and F were preferred to C and E. When the pairs C–D (intermediate quantity of milk) and E–F (high quantity of milk) were compared, the latter were preferred (D vs C = 73.7%, F vs E = 64.6%). The samples that were furthest apart in terms of the probability of one being chosen rather than the other were A–D (D vs A = 89.8%). The more general comments, hedonic rather than descriptive, such as ‘‘good flavor’’, ‘‘well-balanced’’, ‘‘pleasing’’, ‘‘good aroma’’, ‘‘the way I drink it’’ or ‘‘smelly’’ indicate that consumers often rate the samples on a complex sum of attributes.
3.3. Internal preference map based on hedonic scaling data In the internal preference map obtained from principal components analysis of the liking data (Fig. 1), the two principal components explained 62.71% of the variance. Larger groupings of consumers were observed towards the negative values of the first component and the positive values of the second. The lower right
5
quadrant of the PCA map was empty of consumers. Sample D was the best-liked by the largest group and sample A was generally rejected by most of the consumers. The other samples exhibited intermediate behavior, with a certain degree of segmentation. In order to try to study this phenomenon, given that the groups were not as obviously separate in the PCA, HCA was used to group the consumers by liking pattern. Three clusters with different liking patterns were obtained (Fig. 1, Table 5). Cluster 1 (n = 41) preferred samples F, D and E, with ‘‘body’’, ‘‘sweetness intensity’’ and ‘‘milky flavor intensity’’ as the drivers, and disliked an intense coffee flavor and aroma and a darker color. Cluster 2 (n = 13) showed a marked preference for sample D, somewhat less preference for sample F, and disliked the other coffees. This preference pattern was particularly driven by sweetness. The third cluster (n = 11) was made up of consumers who preferred the coffees they perceived as being stronger (‘‘bitterness intensity’’, ‘‘coffee flavor intensity’’, ‘‘color intensity’’ and ‘‘aroma intensity’’). In particular, this group preferred sample C, followed by B. Although the clusters obtained were small in terms of the number of consumers, the idea of this analysis was not to infer an actionable market strategy (for that, clusters of around 50 consumers would be needed) but rather to help in understanding the preference landscape. For this purpose, the HCA was indeed helpful. 3.4. Internal preference map based on preference ranking and open comments Fig. 2 is the preference map based on the preference ranking and positive and negative comments. It shows the MFA-generated distribution of the samples, consumers and ‘‘like’’ and ‘‘dislike’’ terms. In the MFA, the first two factors accounted for 74.48% of the variability in the data; the samples with most sugar (D and F) were correlated with positive values of the first factor, those with least sugar (A and B) clustered on the upper right quadrant of the map and samples C and E, with an intermediate sugar content, presented intermediate behavior. The consumers were again segmented into three groups (Table 5): the first and most numerous (n = 62) preferred samples F and D, the smaller second group (n = 21) preferred D and the third group (n = 13) preferred the more intense samples (B and E). These three groups were similar to those found on the scale-based internal preference map. Most of the liking comments were grouped on the left side of the preference map (upper and lower quadrants), close to samples D and F. As with the R-index, this shows that the consumer population as a whole displayed a greater preference for these samples than for others. The dislike comments clustered in the lower and right part of the map, where the least-preferred samples were located. 4. Discussion 4.1. Overall liking and attribute intensity In general, the overall liking scores were quite low (2.3–5.3), which was attributable to the formulation of the samples (Table 2). To make them easier to prepare repeatedly, soluble coffee was chosen for this study, but even if many of the consumers drank soluble coffee for breakfast, most of them were expecting to evaluate machine coffees because they were consuming them away from home (as commented by many of the participants). For this reason, they penalized the samples. However, the interest of this work is not really to predict how products would actually perform in the market but rather to compare two methodologies, so the focus of this
Please cite this article in press as: Varela, P., et al. An alternative way to uncover drivers of coffee liking: Preference mapping based on consumers’ preference ranking and open comments. Food Quality and Preference (2013), http://dx.doi.org/10.1016/j.foodqual.2013.03.004
6
P. Varela et al. / Food Quality and Preference xxx (2013) xxx–xxx
Fig. 1. Consumer internal preference map obtained from liking ratings and intensity ratings. (A) PCA map of the consumers’ liking ratings and grouping as determined by HCA. (B) Map of the samples and sensory attributes. The attributes were used as supplementary variables in the construction of the map.
Table 5 Preference patterns of the three clusters identified by HCA for each of the preference mapping methods. Average ranking and mean liking score of each sample across consumers in each cluster (Friedman test for preference ranking and Fisher’s LSD for liking ratings). Sample A Liking rating (1: least Cluster 1 (n = 41) Cluster 2 (n = 13) Cluster 3 (n = 11)
B
C
liked, 9: most liked) 3.1a 3.1a 5.0b 2.0a 2.8b 3.4b 2.4ab 4.3cd 5.5d
Preference ranking (1: least liked, 6: most Cluster 1 (n = 62) 1.3a 1.9a Cluster 2 (n = 21) 2.5ab 3.6b Cluster 3 (n = 13) 3.5abc 4.8c
liked) 3.5b 4.0bc 3.5abc
D
E
F
6.6c 5.6d 2.6ab
6.0bc 3.2c 3.4bc
7.1c 4.1c 1.6a
5.0c 5.3c 2.3ab
4.1bc 1.4a 4.4bc
5.0c 4.0bc 2.2a
work was liking patterns rather than ‘‘absolute’’ liking rating values. Looking at the interaction between mean liking for the samples and the intensity of their attributes, in general it was found that the samples with more sugar and a medium to high milk content were liked the most and the more bitter ones, with less sugar, were the least liked. The ‘‘body’’ attribute was rated as least intense in the samples that were liked the least, which is coherent in that their formulation contained the least milk, the ingredient that imparts most body to these beverages. There was no correlation between intensity of color and overall liking, which could mean that as these products do not have an intense ‘‘coffee color’’, this attribute is less important in defining the liking for them. However, the least liked products were rated highest in color intensity, in line with the fact that the general population preferred milder coffees. Overall liking and attribute intensity assessment tests have the advantage of producing results that are comparable between experiments. However, their main disadvantage is that they average out the likings of all the consumers and do not take into account that there are consumers who prefer the samples that are generally liked the least, and vice versa.
4.2. Preference ranking and open comments – comparison with the overall liking data Comparing the results of the preference ranking test and the open comments with the acceptability test using the hedonic scale (liking data), it was found that the liking ranking of the samples was the same: D was the best-liked sample, followed by a group made up of F, E and C, and the least-liked coffees were B and A. The R-index results were very similar to those of the hedonic scale but discriminated better, as seen in samples F, E and C. This agreed with the point made by O’Mahony (1992): that data obtained by ranking (R-index) ought to show greater dispersion than those obtained with an assessment scale. Cliff, King, Scaman, and Edwards (1997) compared data obtained by ranking (R-index) with data from a hedonic scale. Although the two methods were interpreted very differently by the consumers, the results of the two analyses were highly correlated (r = 0.98). While the ranking method is useful for observing consumer preferences for the samples, it cannot identify the presence of segments, as the R-index value shows the preference probability across all the consumers. Inviting the consumers to comment on what they like and dislike about the samples contributes relevant information, making it possible to find out which attributes are more important for consumer preferences and even to study which attributes may be appreciated by some consumers but not by others. 4.3. Internal preference maps The sample groupings in the preference spaces were very similar with both data collection methods, and in both cases the HCA segmented the samples into three groups. As well as the already mentioned advantage of ranking tests being easy for the consumers to carry out, with the added use of the open comments, this test gave more information than the intensity scales and possibly even than the hypothetical use of a trained panel, since consumers generated the attributes they thought important for the samples evaluated. These comments made it possible to see whether the same characteristics were desirable or otherwise for one consumer or another. Although
Please cite this article in press as: Varela, P., et al. An alternative way to uncover drivers of coffee liking: Preference mapping based on consumers’ preference ranking and open comments. Food Quality and Preference (2013), http://dx.doi.org/10.1016/j.foodqual.2013.03.004
P. Varela et al. / Food Quality and Preference xxx (2013) xxx–xxx
7
Fig. 2. Consumer preference map obtained from ranking and open comments. (A) MFA map of the consumers’ preference rankings with segmentation by HCA. (B) Map of the samples with open ‘‘like’’ and ‘‘dislike’’ comments. The comments were used as a supplementary variable in the construction of the map.
not mentioned as often, the presence of liking comments close to the coffees that were less liked by the general population and of dislike comments close to those that were generally most liked clearly shows that what constituted a negative attribute for some consumers was what others with different preference patterns liked flavor. For example, for sample B, ‘‘strong flavor’’ appeared as a like comment, while ‘‘too much coffee flavor’’ was mentioned as dislike comment; also, what some consider the ‘‘right bitterness’’ can be a ‘‘weird flavor’’ for others. Consumers also elicited attributes such as ‘‘foamy’’, among others, which had not been included in the attribute intensity scales but turned out to be of interest to the consumers – and obviously may be to manufacturers as well. In the study performed by Symoneaux et al. (2012), scale-based preference mapping methods with samples description realized by a trained panel provided very similar data to preference mapping based on the consumers’ comments, with the extra advantage that the consumers gave their own drivers of liking. Allowing consumers to comment freely has the advantage over CATA (check-allthat-apply) (Parente et al., 2011) that it does not omit drivers of liking that may be important to consumers but the researcher did not take into account. Hein et al. (2008) compared three consumer acceptance methods (9-point hedonic, labeled affective magnitude and unstructured line scales) and two consumer preference methods (best–worst scaling and preference ranking), concluding they were comparable in terms of implementation and easiness to perform for consumers, and reached similar conclusions. The hereby presented method for better understanding consumers’ segmentation and drivers of liking obtained results in line with those previous works, getting ‘‘the best of both worlds’’: gathered consumer hedonic response via a more natural test as preference ranking is, and obtained rich descriptive information by allowing consumers to freely comment on likes and dislikes. 5. Conclusions Choosing a product is a complex action that may on occasion even be external to the item itself (price, culture, tradition, family
habits, etc.). Attribute rating scales ask the consumer about characteristics that may interest the researcher but may not interest the consumer as much, if at all. A preference ranking test plus open comments is not just quick and easy to administer: because they are allowed to say openly which characteristics drive their likes and dislikes, with no restrictions or preconceived ideas about attributes, the consumers unconsciously evaluate all the characteristics that decide them to choose one product rather than another in a more intuitive and global way. This study shows that very similar results to those obtained with ‘‘traditional’’ internal preference mapping can be obtained by using preference ranking accompanied by open comments. The limitation of this work is the low number of consumers, as it was intended as an initial exploration in order to propose this method. Further research with other products would be desirable to validate its use. Acknowledgments The authors are grateful to the Spanish Ministry of Economy and Competitiveness for financial support (AGL2009-12785-C0201), for the Juan de la Cierva contract awarded to author Paula Varela and for the Geronimo Forteza contract awarded to author Julián Beltrán. The authors also would like to thank Mary Georgina Hardinge for assistance with translating and editing the English version of this paper. References Ares, G., Gimenez, A., & Gambaro, A. (2006). Preference mapping of texture of dulce de leche. Journal of Sensory Studies, 21, 553–571. Ares, G., Giménez, A., Barreiro, C., & Gámbaro, A. (2010). Use of an open-ended question to identify drivers of liking of milk desserts. Comparison with preference mapping techniques. Food Quality and Preference, 21, 286–294. Ares, G., Varela, P., Rado, G., & Giménez, A. (2011). Identifying ideal products using three different consumer profiling methodologies. Comparison with external preference mapping. Food Quality and Preference, 22, 581–591. Brown, J. (1974). Recognition assessed by rating and ranking. British Journal of Psychology, 65, 13–22. Chrea, C., Melo, L., Evans, G., Forde, C., Delahunty, C., & Cox, D. N. (2011). An investigation using three approaches to understand the influence of extrinsic
Please cite this article in press as: Varela, P., et al. An alternative way to uncover drivers of coffee liking: Preference mapping based on consumers’ preference ranking and open comments. Food Quality and Preference (2013), http://dx.doi.org/10.1016/j.foodqual.2013.03.004
8
P. Varela et al. / Food Quality and Preference xxx (2013) xxx–xxx
product cues on consumer behavior: An example of Australian wines. Journal of Sensory Studies, 26, 13–24. Clemons, E. K. (2008). How information changes consumer behavior and how consumer behavior determines corporate strategy. Journal of Management Information Systems, 25, 13–40. Cliff, M. A., King, M. C., Scaman, C., & Edwards, B. J. (1997). Evaluation of R-indices for preference testing of apple juices. Food Quality and Preference, 8, 241–246. Cristovam, E., Russell, C., Paterson, A., & Reid, E. (2000). Gender preference in hedonic ratings for espresso and espresso-milk coffees. Food Quality and Preference, 11, 437–444. Escofier, B., & Pagès, J. (1984). L’analyse factorielle multiple: Une méthode de comparaison de groupes de variables. In R. R. Sokal, E. Diday, Y. Escoufier, L. Lebart, & J. Pagès (Eds.), Data analysis and informatics III (pp. 41–55). Amsterdam: North-Holland. Harwood, M. L., Ziegler, G. R., & Hayes, J. E. (2012). Rejection thresholds in chocolate milk: Evidence for segmentation. Food Quality and Preference, 26, 128–133. Hein, K. A., Jaeger, S. R., Carr, B. T., & Delahunty, C. M. (2008). Comparison of five common acceptance and preference methods. Food Quality and Preference, 19, 651–661. Hye Seong, L., & van Hout, D. (2009). Quantification of sensory and food quality: The R-index analysis. Journal of Food Science, 74, 57–64. Jaeger, S. R., Andani, Z., Wakeling, I. N., & MacFie, H. J. H. (1998). Consumer preferences for fresh and aged apples: A cross-cultural comparison. Food Quality and Preference, 9, 355–366. Januszewska, R., & Viaene, J. (2001). Sensory segments in preference for plain chocolate across Belgium and Poland. Food Quality and Preference, 12, 97–107. Kim, K., & O’Mahony, M. (1998). A new approach to category scales of intensity I: Traditional versus rank-rating. Journal of Sensory Studies, 13, 241–249. Lawless, H. T., & Heymann, H. (2010). Sensory evaluation of food: Principles and practices (2nd ed.). New York: Springer (Chapter 9). Lee, H. S., & O’Mahony, M. (2002). At what temperatures do consumers like to drink coffee?: Mixing methods. Journal of Food Science, 67, 2774–2777. MacFie, H. J. H. (2007). Preference mapping and food product development. In H. J. H. MacFie (Ed.), Consumer-led food product development (pp. 407–433). Cambridge: Woodhead Publishing Ltd. Meilgaard, M. C., Civille, G. V., & Carr, B. T. (2006). Sensory evaluation techniques. Boca Raton: CRC Press.
Moskowitz, H. R., & Bernstein, R. (2000). Variability in hedonics: Indications of world-wide sensory and cognitive preference segmentation. Journal of Sensory Studies, 15, 263–284. Moskowitz, H. R. (2005). Thoughts on subjective measurement, sensory metrics and usefulness of outcomes. Journal of Sensory Studies, 20, 347–362. Næs, T., Brockhoff, P. B., & Tomic, O. (2010). Statistics for sensory and consumer science. Chichester: John Wiley & Sons Ltd. O’Mahony, M. (1992). Understanding discrimination tests: A user-friendly treatment of response bias, rating and ranking R-index tests and their relationship to signal detection. Journal of Sensory Studies, 7, 1–47. Onwezen, M. C., Reinders, M. J., van der Lans, I. A., Sijtsema, S. J., Jasiulewicz, A., Guardia, M. L., et al. (2012). A cross-national consumer segmentation based on food benefits: The link with consumption situations and food perceptions. Food Quality and Preference, 24, 276–286. Parente, M. E., Manzoni, A. V., & Ares, G. (2011). External preference mapping of commercial antiaging creams based on consumers’ responses to a check-allthat-apply question. Journal of Sensory Studies, 26, 158–166. Symoneaux, R., Galmarini, M. V., & Mehinagic, E. (2012). Comment analysis of consumer’s likes and dislikes as an alternative tool to preference mapping. A case study on apples. Food Quality and Preference, 24, 59–66. ten Kleij, F., & Musters, P. A. D. (2003). Text analysis of open-ended survey responses: A complementary method to preference mapping. Food Quality and Preference, 14, 43–52. Thompson, J. L., Drake, M. A., Lopetcharat, K., & Yates, M. D. (2004). Preference mapping of commercial chocolate milks. Journal of Food Science, 69, 406–413. van Kleef, E., van Trijp, H. C. M., & Luning, P. (2006). Internal versus external preference analysis: An exploratory study on end-user evaluation. Food Quality and Preference, 17, 387–399. Varela, P., & Ares, G. (2012). Sensory profiling, the blurred line between sensory and consumer science. A review of novel methods for product characterization. Food Research International, 48, 893–908. Varela, P. (2013). Application of multivariate statistical methods during new product development. In: D. Granato & G. Ares (Eds.), Mathematical and statistical methods in food science and technology. Hoboken: Wiley-Blackwell (in press).
Please cite this article in press as: Varela, P., et al. An alternative way to uncover drivers of coffee liking: Preference mapping based on consumers’ preference ranking and open comments. Food Quality and Preference (2013), http://dx.doi.org/10.1016/j.foodqual.2013.03.004