Food Quality and Preference 32 (2014) 83–92
Contents lists available at SciVerse ScienceDirect
Food Quality and Preference journal homepage: www.elsevier.com/locate/foodqual
External preference segmentation with additional information on consumers: A case study on apples E. Vigneau a,b,⇑, M. Charles a,c, M. Chen a,b a
LUNAM University, ONIRIS, Sensometrics and Chemometrics Laboratory, Nantes, France INRA, Nantes, France c LUNAM University, Groupe ESA, UPSP GRAPPE, Angers, France b
a r t i c l e
i n f o
Article history: Received 7 September 2012 Received in revised form 3 April 2013 Accepted 14 May 2013 Available online 29 May 2013 Keywords: Hedonic study Segmentation Consumer attributes L-shaped data Apples
a b s t r a c t We consider hedonic studies when, in addition to liking scores, external information is available on the products (i.e. sensory descriptors) as well as on the consumers (demographic, usage and attitude attributes). The classification around latent variables (CLV) methodology may be used for segmentation purposes in such situations. Two alternative strategies have been compared on the basis of a case study on 31 apple varieties according to the use a priori or a posteriori of the consumer attributes. The direct approach, L-CLV, which involves the three blocks of information (product hedonic scores, product sensory descriptors and consumer attributes) simultaneously, has demonstrated its ability to reveal a segmentation of consumers associated with a large number of sociological and behavioral parameters, in relation to the key sensory drivers. On the contrary, using a two-step procedure, with first an external preference segmentation by taking into account only the external information on products, no relevant information was gained with the subsequent use of the consumer attributes. For a better investigation of consumer preferences from a marketing research point of view, it appears that it is much more relevant to introduce both types of external information simultaneously and that L-CLV is suitable for this purpose. Ó 2013 Elsevier Ltd. All rights reserved.
1. Introduction External preference mapping is a very popular methodology which aims to provide information about the main ‘‘drivers of liking’’ of consumers regarding the sensory (or physico-chemical) properties of the products of interest (Danzart, Sieffermann, & Delarue, 2004; Greenhoff & MacFie, 1994; Meullenet, Xiong, & Findlay, 2007; Naes, Brockhoff, & Tomic, 2010; Van Kleef, van Trijp, & Luning, 2006). It attempts to relate the sensory profile data to consumer liking scores using various standard statistical methods. The first step of the methodology is to create latent variables based on product sensory attributes. Usually, the two first principal components of the sensory data are considered but other strategies have also been proposed (Faber, Mojet, & Poelman, 2002; Plaehn, 2009; Verdun, Cariou, & Qannari, 2012). Thereafter, these sensory latent variables are used to model the individual consumer likings by means of linear models of varying complexity (vectorial, circular, elliptical or quadratic models). Alternatively, instead of modeling each individual separately, segments of consumers with relatively homogeneous acceptance patterns may be considered. ⇑ Corresponding author at: LUNAM University, ONIRIS, Sensometrics and Chemometrics Laboratory, Nantes, France. Tel.: +33 2 51 78 54 40; fax: +33 2 51 78 54 38. E-mail address:
[email protected] (E. Vigneau). 0950-3293/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.foodqual.2013.05.007
This makes it possible to summarize the hedonic data by the average in each segment. The segment models are finally fitted separately on the sensory latent variables. However, in this process, segmentation and modeling are achieved separately. It seems more relevant to merge consumers who have similar drivers of liking, rather than to identify segments of consumers having similar acceptance patterns and, afterwards, interpret these patterns in the light of the sensory attributes of the products. In order to define groups of consumers and, simultaneously, in each group, the prediction model of the liking scores as a function of the sensory attributes, a segmentation approach which takes account of external data on the products was proposed by Vigneau and Qannari (2002). In practice, this was achieved using the clustering around latent variables (CLV) approach (Vigneau & Qannari, 2003; Vigneau, Qannari, Sahmer, & Ladiray, 2006). This methodology of variables clustering offers the possibility of defining ‘‘directional’’ or ‘‘local’’ groups, of introducing co-variables measured on the same samples and/or additional information on the variables to be clustered themselves. In preference mapping, the L-CLV (an extension of the CLV for L-shaped data) may be applied in order to identify the drivers of liking in segments of consumers and also characterize these segments in terms of demographic, usage and attitude variables collected on the consumers (Vigneau, Endrizzi, & Qannari, 2011).
84
E. Vigneau et al. / Food Quality and Preference 32 (2014) 83–92
It should be noted that, using a rather different framework, latent class vector models (De Soete, & Winsberg, 1993, Courcoux, & Chavanne, 2001) may be adapted for the inclusion of covariates, such as the sensory attributes of the samples (Poulsen, Brockhoff, & Erichsen, 1997). However, the estimation of the parameters of these models requires the implementation of relatively complex EM-algorithms. With algorithms somewhat comparable to those of the CLV, another approach has also been developed in order to merge consumers who have similar drivers of liking. This is based on the fuzzy C-means (FCM) methodology and uses the residual distance between the linear combination of the sensory descriptors and the individual likings (Berget, Mevik, & Naes, 2008; Bolling Johansen, Hersleth, & Naes, 2010; Menichelli, Olsen, Meyer, & Naes, 2012; Wedel & Steenkamp, 1991). In addition to fuzziness, with the membership values of each consumer ranging between 0 and 1, this latter approach has the advantage of allowing an incomplete design in which each consumer has not tested all the samples. Nevertheless, additional information on the consumers cannot be directly introduced into the optimized criterion. Usually (Delgado, & Guinard, 2012; Helgesen, Solheim, & Naes, 1997; Naes, Kubberod, & Sivertsen, 2001; Sveinsdóttiret al., 2009; among others), the segments are related a posteriori to demographics or other consumer attributes by some type of linear regression analysis, PLS or PCR regression or factorial discriminant analysis. In the conjoint analysis context, Naes et al. (2010) have compared a simultaneous analysis, combining experimental factors and consumer attributes, with two-step approaches based on the PLS or PCR regression. However, the proposed simultaneous approach, using conventional linear models, requires unfolding the data matrices and selecting a limited number of consumer attributes (typically age group, gender or a categorical variable associated with some groups of consumers defined beforehand). Until now, only the L-PLS regression (Martens et al., 2005) or the L-CLV (Vigneau et al., 2011) have enabled the three blocks of data available (liking scores, product attributes and consumer attributes) to be analyzed in a single step, without unfolding and regardless of the amount of additional information collected on the consumers. The L-PLS regression results in graphical representations where the factorial components are defined according to the relationships between all types of variables whereas the L-CLV is oriented towards the segmentation of the panel. We will focus on the advantages or disadvantages of performing the segmentation of a panel of consumers using the L-CLV method compared to a two-step approach. In fact, our objective is to understand better how slightly different data analysis strategies impact the interpretation of the data. The L-CLV represents a direct approach simultaneously involving the liking scores and the personality information on the consumers as well as the additional variables measured on the samples. The two-step approach consists of an external preference segmentation in which the consumer attributes are not part of the primary step of data analysis but are related afterwards to the results of the segmentation. The primary step is performed using the CLV algorithm by taking into account only the additional data on the products. Both of these strategies will be compared on the basis of a recent experiment conducted on apples. A large set of apple varieties was described, in parallel, by a trained sensory panel and rated in terms of liking by consumers. In addition, consumers were asked to fill in a questionnaire.
2. The case study A study on 31 batches of apple varieties produced in France (the Loire valley) was conducted at the beginning of 2011. The products were chosen in order to cover as much variability as possible in the
Table 1 List of the 31 apple varieties. Code
Variety
ARI ARI2 BC CHA CM COX CRI DJU DLC DLS DLT FJ GD GR GR2 GS GSA HC HC2 JAZ JON JUL PIL PIN RA RBR RCLO RG RGC SW TT
Ariane Ariane2 Belchard Chantecler Chailleux Caméo Cox’s Orange Pippin Crimson Cripps Delbard JubiléÒ Dalincot Dalinsweet Dalitron Fuji Golden Delicious Goldrush Goldrush2 Granny Smith Golden de Savoie Honey CrunchÒ Honey CrunchÒ2 Jazz™ Jonagored Juliet Pink LadyÒ CorailÒ Pinova Reinette d’Armorique Reinette de Brive Reinette Clocharde Royal Gala Reinette grise du Canada Schneywell TentationÒ
texture, taste and aroma that can be found on the apple market. The product set was composed of well-known commercial apple varieties, new varieties and more rustic ones. These are listed in Table 1. A trained panel of 15 assessors was selected and trained according to ISO standards 8586 (ISO, 1993) and 11035 (ISO, 1995). They agreed on a list of 30 attributes including 19 descriptors for aroma. After a statistical analysis based on the citation frequencies and the redundancy among the attributes in each category (texture, flavor, aroma), 15 attributes were finally retained: three for the texture (crunchy, juicy, fondant), two for the flavor (sweet, acid), the overall odor intensity, the overall aroma intensity and eight descriptors for specific aromatic notes (A_Pineapple/Banana, A_Sweet/Rose, A_Woody/Earthy, A_Rustic, A_Lemon, A_Whiteflowers, A_Ripe fruit, A_Green). All attributes were evaluated on a 10 cm unstructured scale anchored from 0, not perceived, to 10, extremely intense. Products were presented in a monadic way according to a balanced design to avoid order and carry-over effects. Five eighths of each apple variety were served to each judge. Products were evaluated in duplicate. Profile measurements were carried out in sensory computerized booths according to NF ISO 8589 standards. Scores were collected with FIZZ (version 2.10; Biosystems, Courtenon, France). The sensory room was kept at 21 ± 1 °C, red lights were used and rinsing with mineral water between samples was mandatory. During the same period of time, 224 regular apple consumers were recruited locally. The panel was balanced for gender, active/ inactive people and for four age categories (18–25, 26–40, 41–55 and 56 years old and over). Products were presented monadically, in a random order, at room temperature. A blind warm-up sample was presented at the beginning of each session to avoid an effect of the first product (Wakeling, & MacFie, 1995). For the evaluation of each product, two eighths of peeled apple were served under white light. The test took place over four consecutive weeks. As far as possible, each consumer participated in one session per week.
E. Vigneau et al. / Food Quality and Preference 32 (2014) 83–92
Overall liking was evaluated with a standard 9-point hedonic scale from ‘extremely unpleasant’ to ‘extremely pleasant’. At the end of the last session, the consumers were asked to fill in a questionnaire about their socio-demographics: age, gender, professional activity (not retained here). They were also asked about their usage of and attitude towards apples. This included their frequency of consumption of apples (every day, 1–2 times per week, 2–3 times per month, less often) and how they usually consumed apples (peeled/unpeeled, whole/in quarters, at home/ outside, during a meal/as a snack). In order to assess their knowledge of apple varieties, they had to indicate the varieties they knew from a list of 18 items. The number of items selected was retained. They also had to select out of sweetness, crunchiness, perfume, juiciness and sourness, the most important (or the two most important) sensory characteristics concerning apples, from their point of view. In the same way, they had to select up to three items to indicate their purchase criteria. The list of choices included appearance, packaging, freshness, smell, color, variety, information on the origin, French origin, price and label (PGI, organic). The supply locations were requested (hyper/supermarket, fresh vegetable shop, local market, producer, private garden). Finally, two blocks of agreement questions using a Likert-type scale were presented. The first one was about their opinion of apples: a good fruit, a food, a dessert, a snack, for every day, for the children. The second block included 14 items on consumption and preference, mainly related to fruits and vegetables. These items were, for instance, ‘‘I love acidic fruit’’, ‘‘I eat a lot of vegetables’’, ‘‘I love all fruits’’, ‘‘I am interested in health foods’’. 3. Methods 3.1. Notations Let us consider that p consumers have given a liking score to n products. These scores are arranged in an n p matrix, Y. Additional sensory information has also been observed on the products and arranged in the X-matrix. X is an n q matrix, q being the number of external descriptors measured on the products. From the additional information available on the consumers, usually collected by means of a questionnaire, the matrix Z is constructed. As most of the items in a questionnaire are categorical variables, each of them is transformed into dummy variables (0/1). Overall, if m represents the number of dummy and numerical variables, the Z-matrix is of size p m. Based on these data, our purpose is to reveal a meaningful segmentation of the panel of consumers. Nevertheless, various partitions may be obtained according to the clustering algorithm used and the number of clusters retained. Then, in the following, each solution will be identified by two figures: the data sets involved when applying the CLV and the size of the partition. The first figure will be ‘‘Y’’, ‘‘YX’’ or ‘‘L’’ which means, respectively, that the segmentation is based on the liking scores Y only, that the external X-matrix is considered in addition to Y, or that the three data sets (Y, X and Z) are considered simultaneously. For instance, ‘‘L3’’ will identify the partition into three segments obtained with the L-CLV. Next, each segment within a partition will be given a number, so that L3–1 will refer to the first segment for the partition ‘‘L3’’. 3.2. Pre-processing More precisely, the matrices used for data analysis are not the original ones but are submitted to simple pre-processing steps. Thus, the Y-matrix is (column-) centered in order to remove the differences between the consumers related to the level of the scale they used. Moreover, it is possible to standardize the vector of the
85
scores of each consumer. In this manner, the focus is fixed on the directions of preference, and no longer on the differences in the range of the scores between consumers. However, this standardization is open to discussion and the user has to make a choice according to his/her own data. The X-matrix is also centered. In the case of sensory variables measured on the same scale, the standardization is not usually advised. Nevertheless, when aroma attributes are rated in addition to other sensory attributes, marked differences can often be observed in the frequency distributions. Specifically, the distribution of the values for the aroma attributes often shows a central parameter located near the lower end of the scale, with a small dispersion and some skewness. The standardization of the sensory descriptors may be advised in such situations. Regarding the matrix, Z, of the external data on the consumers, numerical variables are centered and standardized by their standard deviation. Each set of dummy variables associated with the same categorical variable is globally scaled in order to have the same weight as the numerical variables. 3.3. Clustering among latent variables Let us consider studies in which the aim is to identify segments of consumers with similar drivers of liking (or disliking) and to relate these segments of consumers to demographics and/or other behavioral characteristics. Two strategies will be compared in the following. The first one is (a) to define groups of consumers associated as closely as possible with a linear combination of the sensory attributes of the products, by means of the CLV algorithm on the hedonic data, Y, with the sensory attributes, X, as external variables and (b) to characterize the partition in relation to the consumer attributes, Z. The second strategy was recently proposed under the acronym of L-CLV. It consists in determining simultaneously the groups of consumers and two latent variables associated with each group, the first one being defined as a function of the sensory attributes of the products, the second one depending on the consumer attributes. For both clustering approaches, the partition of the consumers into K groups is the result of an optimization problem. See Vigneau and Qannari (2003), Vigneau et al. (2006) or Vigneau et al. (2011) for detailed presentations of the various criteria investigated in the area of the CLV methodology. In practice, all the procedures developed to date within the CLV framework can be undertaken using programs implemented in R language, available upon request from the corresponding author. In the case of the CLV on Y with X as external data (situation (a), denoted CLVY/X in the following), the aim is to maximize:
Sx ¼
p K X X
dkj cov ðyj ; t k Þ with t k ¼ Xak and atk ak ¼ 1
ð1Þ
k¼1 j¼1
where dkj is the group indicator of consumer j (dkj = 1 if consumer j belongs to group k, 0 otherwise), yj, the vector of the liking scores of consumer j and tk, the latent variable associated with group k. In this case, each latent variable tk is constrained to be a linear combination of the external variables. It can be shown that, for a given partition, the latent variable tk, in the kth group, is simply the first PLS regression component of the average liking scores of the k , on X. Moreover, each loading associated consumers in group k, y with a given sensory variable in ak is proportional to the covariance k and this variable. The relative importance of a sensory between y attribute as a driver of liking or of disliking can thus be easily assessed by the square of its loading in ak. The maximization of the criterion SX in (1) is actually a very similar problem compared to the minimization of a criterion based on the sum of the squared residual distances between the vectors
86
E. Vigneau et al. / Food Quality and Preference 32 (2014) 83–92
yj and the model fitted in a group. This second criterion has been proposed as an alternative to S within the CLV methodology (Vigneau et al., 2006), for a crisp clustering algorithm, or within the fuzzy C-means (FCM) methodology (Berget et al., 2008; Wedel & Steenkamp, 1991). Nevertheless, in the case of high colinearity between the X-variables, our opinion is that the problem as defined in Eq. (1) is preferable for a better interpretability of the loading coefficients, even if the quality of the model adjusted in each cluster is not optimal. As the objective is to identify and understand the central tendencies in the hedonic space, this trade-off seems reasonable. In fact, if the fuzzifier parameter in the FCM is chosen close to 1 (the value often used is 2 but the optimal value in Bolling Johansen et al. (2010) was 1.1), and if the matrix X is not too ill-conditioned, these different approaches will lead to very similar results. Once the segmentation of the consumers is achieved, herein using the CLVY/X method, the last step of the data analysis is to explain the partition with respect to the additional information available on the consumers. The simplest way to proceed is to test the discriminability of each variable in Z regarding the categorical variable defined by the partition. A chi-square test was performed for each dummy 0/1 variable, whereas an ANOVA was conducted on each numerical variable. The p-values derived from these models were transformed into q-values (or valeur-test according to Lebart, Morineau, & Piron, 2006). This is simply a non-linear transformation of the scale, the q-value being the quantile for a Gaussian distribution associated with this upper-tailed probability. Levels of 2 or 3 are commonly used for highlighting significant or very significant effects. In the case of the L-CLV (Vigneau et al., 2011), all the blocks of data, Y, X and Z, play an active role in the segmentation. The criterion to be maximized is now:
Szx ¼
K X cov ðck ; tk Þ with tk ¼ Xak ; atk ak ¼ 1 and ck k¼1
¼ Pk uk ; utk uk ¼ 1
ð2Þ
In this equation, Pk = Yk Zk, where Yk is taken from Y and Zk is a part of Z, but whose columns are associated only with the consumers belonging to the group k. Then the problem consists in maximizing the covariance between a pair of latent variables defined in each cluster k. tk is a linear combination of the sensory attributes highlighting the drivers of preference in the group k whereas ck depends on the interaction between the likings and the attributes of the consumers belonging to this group. The algorithm updates iteratively (i) the vectors of loadings ak and uk taking the first right and left eigenvectors of Ztk Ytk X and (ii) the group memberships of the consumers which leads to the revision of the Pk matrix. As shown in Eq. (2), the L-CLV bears many similarities to the L-PLS approach (Martens et al., 2005) but here a specific triplet Ztk Ytk X is diagonalized and updated in each cluster k. One of the advantages of the L-CLV over the CLVY/X is that no further step is needed in order to characterize the cluster by means of the demographic or usage attributes of the consumers. The vector of loadings uk makes it possible to reveal the main characteristics in each segment. Nevertheless, in order to compare the outputs of both methodologies, the same test procedures (chi-square tests or ANOVA) as those conducted after the segmentation with the CLVY/X were performed.
4. Results and discussion The results obtained by using the L-CLV are first described. They are then compared to the two-step procedure involving the CLVY/X.
Fig. 1. Dendrogram and evolution of the aggregation criterion for the L-CLV.
E. Vigneau et al. / Food Quality and Preference 32 (2014) 83–92
87
Fig. 2. Representation of the consumers in the internal preference mapping with the group memberships and the latent variables, tk and ck, obtained in each segment k with the L-CLV added (on the left: dimension 1 dimension 2, on the right: dimension 2 dimension 3).
For segmentation purposes, the first parameter to be chosen is the number of clusters, K, of the partition. In the CLV methodology, a hierarchical clustering algorithm, using the criterion to be optimized, is performed before the final consolidation step. The dendrogram as well as the graph showing the evolution of the aggregation criterion on the course of the algorithm are given and help the users in their choice of K. From Fig. 1, in the case of the L-CLV, three clusters appear clearly. They represent 37%, 43% and 20% of the panel, with 82, 96 and 46 consumers, respectively. The group membership was used to label the consumers represented in the factorial space derived from an internal preference analysis (PCA on Y). Fig. 2 gives the graphical display in dimension 1 dimension 2, on the left-hand side, and dimension 2 dimension 3, on the right-hand side. The latent variables tk and ck, which
represents the central tendencies in each segment k, are also projected on the graphs in Fig. 2. It is easily observed that the latent variables in each segment (tk which depends on the sensory descriptors and ck which depends on the consumers attributes) are highly correlated. Moreover, the first and second segments, L3–1 and L3–2, are not very distinguishable along the first dimension but are separated along the third dimension. Looking at the configuration of the products in the internal preference factorial space, where the sensory descriptors are projected as supplementary variables (Fig. 3), all these consumers seem to appreciate sweet apples, with a high aroma intensity, specifically the pineapple/banana aroma, but the segments L3–1 and L3–2 are not in agreement according to the acidity and the lemon aroma. It can also be seen that the consumers of the third group, L3–3, are well
Fig. 3. Representation of the products in the internal preference mapping with the sensory attributes projected as supplementary variables (on the left: dimension 1 dimension 2, on the right: dimension 2 dimension 3).
88
E. Vigneau et al. / Food Quality and Preference 32 (2014) 83–92
Table 2 Loadings associated with the sensory descriptors in each of the latent variables tk (k = 1, 2 or 3) in the partition into three segments obtained with the L-CLV. These loadings refer to linear effects on the left-hand part, and quadratic effects (_sq) on the right-hand part.
Crunchy Juicy Fondant Sweet Acid Odour int. Aroma int. A_Pineapple/Banana A_Sweet/Rose A_Woody/Earthy A_Rustic A_Lemon A_White flowers A_Ripe fruit A_Green
L3–1 t1
L3–2 t2
L3–3 t3
0.31 0.31 0.22 0.33 0.07 0.20 0.28 0.32 0.32 0.23 0.23 0.07 0.05 0.02 0.15
0.20 0.15 0.15 0.38 0.06 0.19 0.34 0.39 0.32 0.27 0.29 0.08 0.03 0.09 0.20
0.21 0.07 0.30 0.28 0.30 0.04 0.07 0.07 0.29 0.14 0.06 0.24 0.10 0.26 0.35
represented along the second dimension. They do not appreciate apples with a crunchy and juicy texture but prefer a fondant texture. These findings may also be derived from the loadings, ak, associated with each sensory descriptor in the latent variable tk in each segment. More precisely, the prediction models of the preference given by the latent variables tk have been fitted by using both a linear effect and a quadratic effect for each of the 15 sensory attributes. The coefficients in each of the three groups are shown in Table 2. As it is quite difficult to understand the overall effect resulting in the linear and quadratic terms, for each sensory descriptor, partial components have been reconstructed (Fig. 4). Once again, it is observed that the key sensory drivers for segments L3–1 and L3–2 are very similar except for the acidity and the lemon aroma. The consumers in segment L3–3 have a different pattern of sensory drivers: they do not reject a fondant texture, appreciate more rustic and ripe-fruit aromas than a pineapple/banana aroma, and clearly reject acidity and a green aroma. As for the loadings of the consumer attributes in the latent components ck, they enable the hedonic segmentation to be
Crunchy_sq Juicy_sq Fondant_sq Sweet_sq Acid_sq Odour int. _sq Aroma int. _sq A_Pineapple/Banana_sq A_Sweet/Rose_sq A_Woody/Earthy_sq A_Rustic_sq A_Lemon_sq A_White flowers_sq A_Ripe fruit_sq A_Green_sq
L3–1 t1
L3–2 t2
L3–3 t3
0.14 0.15 0.12 0.03 0.12 0.01 0.09 0.17 0.13 0.08 0.01 0.21 0.04 0.00 0.09
0.10 0.08 0.13 0.03 0.08 0.02 0.04 0.21 0.14 0.09 0.06 0.12 0.01 0.02 0.11
0.10 0.05 0.02 0.15 0.24 0.19 0.02 0.18 0.10 0.08 0.08 0.21 0.04 0.17 0.25
understood in the light of socio-demographic, usage and attitude parameters. Regarding the categorical attributes, the loadings of the dummy variables are given in Fig. 5 and the most discriminant ones are highlighted. It turns out that even if consumers in segments L3–1 and L3–2 are very similar according to their sensory drivers, they are different groups of people. Consumers in segment L3–1 are specifically the youngest, whereas segment L3–2 contains a large number of people up to 56 years old. The former listed fewer apple varieties than the latter. They stated that crunchiness is an important sensory characteristic while for people in segment L3–2, it is the acidity. It can be noted that these claims are in agreement with the observed preferences. Moreover, the consumers in L3–1 buy apples mainly in supermarkets, pay attention to the appearance, the color and the packaging and eat apples unpeeled. This is not surprising as most of the students recruited belong to this segment (18/29 = 62%). For the other consumers, the purchase criteria are mainly the variety or the French origin; the apples they consume may come directly from a producer or from a garden, not only from the supermarket or market. In Fig. 6, which shows
Fig. 4. Partial latent components reconstructed from the loadings associated with the sensory attributes (ak) for the partition into three segments of the L-CLV (‘‘L3’’). L3–1 ; L3–2 ; L3–3 .
E. Vigneau et al. / Food Quality and Preference 32 (2014) 83–92
89
Fig. 5. Loadings associated with the categorical consumers attributes for the partition into three segments of the L-CLV (‘‘L3’’). L3–1: black; L3–2: dark gray; L3–3: light gray. ⁄ Discriminant dummy variables.
Fig. 6. Loadings associated with the agreement questions for the partition into three segments of the L-CLV (‘‘L3’’). L3–1: black; L3–2: dark gray; L3–3: light gray. ⁄ Discriminant variables.
the loadings associated with the agreement questions (opinion of apples and consumption/preference style), the opposition between
L3–1 and L3–2 is also clearly uncovered. The young consumers, in L3–1, do not have a very good opinion of apples unlike the
90
E. Vigneau et al. / Food Quality and Preference 32 (2014) 83–92
Fig. 7. Dendrogram and evolution of the aggregation criterion for the CLVY/X.
Fig. 8. Representation of the consumers in the internal preference mapping with the group memberships and the latent variables obtained with the CLVY/X added (on the left: dimension 1 dimension 2, on the right: dimension 2 dimension 3).
consumers in the L3–2 segment who rated apples, fruits, vegetables or organic foods positively. The consumers in segment L3–3 appear to be not very fond of apples and rated acidic fruits very negatively. Once again, this is in agreement with the poor liking scores that they gave to the varieties with a high acidity and a green aroma (i.e. GS, RA or COX). Compared with the segmentation obtained by using the L-CLV, the external preference segmentation (CLVY/X), in which the external information collected by means of the questionnaire is not taken into account in the clustering criterion, led to a solution
consistent with the previous ones regarding the sensory drivers but not for the socio-demographic or behavioral attributes of the consumers, as is now shown. First of all, the size of the partition to be retained by using the CLVY/X may be three or four, as depicted in Fig. 7. We chose a partition into four groups because of its better coherence with the previous partition resulting from the L-CLV. The segments YX4–1, –2, –3 and –4 consisted of 67, 45, 72 and 40 consumers, respectively. In the same manner as in Fig. 2, the group memberships and the latent variables tk in each segment were added to the internal preference mapping (Fig. 8). Segments
E. Vigneau et al. / Food Quality and Preference 32 (2014) 83–92
YX4–1 and YX4–2 came out as rather similar to L3–1 and L3–2, respectively. Segments YX4–4 and L3–3 were almost identical with the same consumers, except six. The segment YX4–3 was in an intermediate position between the previous segments. The consumers in this segment seem to like aromatic varieties of apple, with a pineapple/banana aroma or a sweet/rose aroma; they appreciate sweetness and do not like a green aroma. The last step of the analysis in the external preference segmentation (i.e. CLVY/X) is to relate the partition retained to the additional information available on the consumers. The discriminant levels for each of the consumer attributes are listed in the last column of Table 3, and the most discriminant parameters are highlighted. It seems that the YX4 partition can easily be related to the important sensory attributes stated. It can be observed that crunchiness is often cited as an important sensory characteristic of apples by the consumers in segment YX4–1 and that acidity, when chosen, is mostly cited by the consumers of YX4–2, which emphasizes the good internal consistency of the consumers. Nevertheless, it turns out that the partition YX4 cannot be
Table 3 Discriminant levels (expressed as q-values) for the consumer attributes in relation to the segmentation of the consumers obtained with the L-CLV or CLVY/X approach. qValues 2 are displayed in bold type, while q-values >3 are given in bold and underlined type. Consumers attributes
q-Values L-CLV
CLVY/X
Male
4.11
0.00
Female
4.11
0.00
Age category
A18–25
4.69 1.74 2.86
1.19
Gender
Consumption
A41–55 A56+ Every D
Frequency
1–2 T p W
Nb varieties
2–3 T p M Nb5–9
Known
Nb10–15
Important Sensory
Nb16+ Sweet Crunchy
Characteristics
Perfume
How
Acid Peel_no
Consuming Purchase
Whole/quarter Meal_out Appearance
Criteria
Packaging Smell Color Variety Fr origin
Supply Locations
5.64
0.80 0.24 0.00 0.00
5.18 1.65 2.31
0.63 2.38
4.11
5.16
3.79 2.61
3.48 2.38 0.82
3.23 1.65 1.94 3.43 3.12 1.68 4.14 2.34 4.71 2.63 2.89 1.96 2.90 1.67
0.00 0.00 0.01 1.89 0.00 0.43 0.00 0.39
4.45
0.98 0.00 1.81 0.00 0.00 0.48
For every day
4.65
0.00
For children
4.93
0.00
Consumption/
Acid fruit
5.29
Preference
Fruit essential
Style
Add sugar Vegetable
3.23 1.54
5.41 0.00
Opinion On apples
Hyper Producer Garden Good fruit Dessert Delicacy
5.59 3.81 1.77
0.73 0.07 0.97
4.27
2.07 0.00
Apples all form
3.89
1.89
All fruits
3.46 2.65
0.31
Organic food Rustic fruit veg
3.32
0.00 0.00
91
efficiently related to the socio-demographics, the usages and the attitudes of the consumers. The results shown in Table 3 reveal the fundamental difference between the partition derived from the external preference segmentation with the CLVY/X and the partition obtained by using the L-CLV. Obviously, the external information collected on the consumers was not taken into account in the external preference segmentation, so that these results are not so surprising. Nevertheless, although the main preference tendencies and their associated key drivers have been uncovered in both approaches, the L-CLV method has demonstrated, in this case study, its effectiveness in revealing groups of consumers rather well associated with the socio-demographic and behavioral typology of the panel. More specifically, with respect to the French panel of 224 consumers recruited in the Angers area, three consumer segments have been identified. The first segment (37% of the panel) contains a majority (67%) of the young adults, who appreciate the most aromatic apples and look for crunchy varieties. They are attentive to the appearance and the packaging when they buy apples, do not know many apple variety names, do not think that apples are particularly ‘‘good’’ fruits but consume them unpeeled, quite frequently, perhaps for their convenience aspect. The second segment (43% of the panel), rather similar to the previous one according to the sensory drivers, was also identified. It differs from the first segment because these consumers seem to accept varieties with some acidic taste and a green aroma. 72% of the consumers in this group are more than 41 years old and 36% are retired. Generally speaking, they appreciate apples, consume them very frequently, know a large number of varieties and choose the apples they buy according to their origin and the cultivar. The last segment (20% of the panel) is very particular in terms of its drivers of preference. They do not look for apples with an exotic fruit aroma, but for ripe fruit and rustic notes. Unlike the other consumers, they accept apples with a fondant texture, perhaps because these apples are also the less acidic varieties. Moreover, they are attentive to the sweetness of an apple and do not like acidic fruits at all. Except for this specificity, they cannot be easily depicted by any other socio-demographic or attitudinal attributes.
5. Conclusion Statistical clustering approaches are useful tools for identifying segmentation among a panel of consumers according to their liking scores. The objective is often to determine not only the groups of consumers who have the same acceptance patterns but who also have similar sensory drivers of preference. The additional goal of relating the segmentation to consumer external information, collected through a questionnaire, is frequently undertaken. For this purpose, the Clustering of variables around Latent Variables (CLV) methodology can be applied. Its rationale is to determine the central tendencies in the explored space. Moreover, in order to take account of external information either on the products, the consumers or both, constraints can be imposed on the latent variables. This rationale turns out to be valuable in the context of hedonic studies as it is often shown that the clusters, obtained by means of any ‘‘statistical’’ crisp clustering algorithm, have rather poor criteria of separability and consistency. In fact, each consumer belongs to one, and only one, group but not all consumers are well represented by their group mean, or some consumers are almost between two groups. Consequently, this can lead to difficulties in the interpretation of the solution, especially when we attempt to add external information about the consumers. A possible relevant approach is to work at the segment level by means of group latent variables. On the basis of the apple case study presented here, the loadings associated with these latent variables
92
E. Vigneau et al. / Food Quality and Preference 32 (2014) 83–92
make it possible to emphasize the main characteristics in each segment. The usual way to proceed when three types of information have been collected (i.e. the liking scores, additional information on the consumers and the product attributes) is to work in two steps: the first one is an external preference segmentation involving only the external data related to the products (CLVY/X), the last step being to describe the partition obtained by means of the consumer attributes. The expectation is that the additional information on the consumers will add new, constructive knowledge about the segmentation of the panel. It was shown in our case study that this was not especially the case: no relevant information was gained with the subsequent use of the consumer attributes. In contrast, taking into account external information on the product attributes and consumer attributes simultaneously made it possible to reveal a segmentation of consumers that could be understood in terms of sociological and behavioral parameters in relation to the key sensory drivers. In the apple study, the extension of the CLV method in order to manage L-shaped data proved to be suitable for this purpose. Even though the segmentation of the consumers was less detailed according to the sensory drivers with the L-CLV than with the CLVY/X (the first three groups with CLVY/X were aggregated into two groups with L-CLV), the consumer characteristics enabled the solution to be turned towards relevant socio-demographic, usage and attitude parameters. Nevertheless, although the value of a simultaneous approach using the L-CLV, compared to a two-step procedure, has been demonstrated in this case study on apples, it needs to be further investigated in other studies. Another aspect to bear in mind in the choice of an analytical strategy is the real objective of the end-user, as reported elsewhere, in particular by Van Kleef et al. (2006). If the study is mainly aimed at a food technological task, external preference mapping and/or external preference segmentation are probably more actionable. The L-CLV approach (like the L-PLS method) appears to have a clear advantage for marketing actionability. Acknowledgments This study was carried out under the COSIVEG program with financial support from the Pays de la Loire area in France. Sensory panellists and consumers are sincerely thanked for their involvement and motivation during the sessions. References Berget, I., Mevik, B. H., & Naes, T. (2008). New modifications and applications of the fuzzy C-means methodology. Computational Statistics & Data Analysis, 52, 2403–2418. Bolling Johansen, S., Hersleth, M., & Naes, N. (2010). A new approach to product set selection and segmentation in preference mapping. Food Quality and Preference, 21, 188–196. Courcoux, P., & Chavanne, C. (2001). Preference mapping using latent class vector model. Quality and Preference, 12, 369–372.
Danzart, M., Sieffermann, J.-M., & Delarue, J. (2004). New developments in preference mapping techniques: Finding out a consumer optimal product, its sensory profile and the key sensory attributes. In: 6th Sensometric meeting, Davis, CA. De Soete, G., & Winsberg, S. (1993). A latent class vector model for preference ratings. Journal of Classification, 10, 195–218. Delgado, C., & Guinard, J.-X. (2012). How do consumer hedonic ratings for extra virgin olive oil relate to quality ratings by experts and descriptive analysis ratings? Food Quality and Preference, 22, 213–225. Faber, N. M., Mojet, J., & Poelman, A. A. M. (2002). Simple improvement of consumer fit in external preference mapping. Food Quality and Preference, 14, 455–461. Greenhoff, K., & MacFie, H. J. H. (1994). Preference mapping in practice. In H. J. H. MacFie & D. M. H. Thomson (Eds.), Measurement of food preferences (pp. 137–166). London: Blackie Academic & Professional. Helgesen, H., Solheim, R., & Naes, T. (1997). Consumer preference mapping of dry fermented lamb sausages. Food Quality and Preference, 8, 97–109. ISO (1993). Sensory analysis – General guidance for the selection, training and monitoring of assessors – Part 1: Selected assessors, 8586-1. Genf, Switzerland: International Standard Organization. ISO (1995). Sensory analysis – Identification and selection of descriptors for establishing a sensory profile by a multidimensional approach, 11035. Genf, Switzerland: International Standard Organization. Lebart, L., Morineau, A., & Piron, M. (2006). Statistique exploratoire multidimensionnelle (4th ed.). Paris: Dunod. Martens, H., Anderssen, E., Flatberg, A., Gidskehaug, L. H., Hoy, M., & Westad, F. (2005). Regression of a matrix on descriptors of both its rows and its columns via latent variables: L-PLSR. Computational Statistics and Data Analysis, 48, 103–123. Menichelli, E., Olsen, N. V., Meyer, C., & Naes, T. (2012). Combining extrinsic and intrinsic information in consumer acceptance studies. Food Quality and Preference, 23, 148–159. Meullenet, J.-F., Xiong, R., & Findlay, C. J. (2007). Multivariate and probabilistic analyses of sensory science problems. Oxford, UK: IFT Press & Blackwell Publishing. Næs, T., Brockhoff, P. B., & Tomic, O. (2010). Statistics for sensory and consumer science. Chichester, UK: John Wiley & Sons. Naes, T., Kubberod, E., & Sivertsen, H. (2001). Identifying and interpreting market segments using conjoint analysis. Food Quality and Preference, 12, 133–143. Plaehn, D. (2009). A variation on external preference mapping. Food Quality and Preference, 20, 427–439. Poulsen, C. S., Brockhoff, P. M. B., & Erichsen, L. (1997). Heterogeneity in consumer preference data-A combined approach. Quality and Preference, 8, 409–417. Sveinsdóttir, K., Martinsdóttir, E., Green-Petersen, D., Hyldig, G., Schelvis, R., & Delahunty, C. (2009). Sensory characteristics of different cod products related to consumer preferences and attitudes. Food Quality and Preference, 20, 120– 132. Van Kleef, E., van Trijp, H. C. M., & Luning, P. (2006). Internal versus external preference analysis: An exploratory study on end-user evaluation. Food Quality and Preference, 17, 387–399. Verdun, S., Cariou V., & Qannari, E. M. 2012. Quadratic PLS applied to external preference mapping. In: 11th Sensometric meeting, Rennes, France. Vigneau, E., Endrizzi, I., & Qannari, E. M. (2011). Finding and explaining clusters of consumers using CLV approach. Food Quality and Preference, 22, 705–713. Vigneau, E., & Qannari, E. M. (2002). Segmentation of consumers taking account of external data: A clustering of variables approach. Food Quality and Preference, 13, 515–521. Vigneau, E., & Qannari, E. M. (2003). Clustering of variables around latent component. Communications in Statistics – Simulation and Computation, 32, 1131–1150. Vigneau, E., Qannari, E. M., Sahmer, K., & Ladiray, D. (2006). Classification de variables autour de composantes latentes. Revue de Statistique Appliquée, 27–45. LIV. Wakeling, I. N., & MacFie, H. J. H. (1995). Designing consumer trials balanced for first and higher orders of carry-over effect when only a subset of k samples from t may be tested. Food Quality and Preference, 6, 299–308. Wedel, M., & Steenkamp, J.-B. E. M. (1991). A clusterwise regression method for simultaneous fuzzy market structuring and benefit segmentation. Journal of Marketing Research, 28, 385–396.