Journal Pre-proofs What does it mean to check-all-that-apply? Four case studies with beverages Sara R. Jaeger, Michelle K. Beresford, Kim R. Lo, Denise C. Hunter, Sok L. Chheang, Gastón Ares PII: DOI: Reference:
S0950-3293(19)30381-7 https://doi.org/10.1016/j.foodqual.2019.103794 FQAP 103794
To appear in:
Food Quality and Preference
Received Date: Revised Date: Accepted Date:
14 May 2019 2 September 2019 11 September 2019
Please cite this article as: Jaeger, S.R., Beresford, M.K., Lo, K.R., Hunter, D.C., Chheang, S.L., Ares, G., What does it mean to check-all-that-apply? Four case studies with beverages, Food Quality and Preference (2019), doi: https:// doi.org/10.1016/j.foodqual.2019.103794
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Published by Elsevier Ltd.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Manuscript for submission to: Food Quality and Preference
What does it mean to check-all-that-apply? Four case studies with beverages
Sara R. Jaeger1*, Michelle K. Beresford1, Kim R. Lo1, Denise C. Hunter1, Sok L. Chheang1, Gastón Ares2
1The
New Zealand Institute for Plant & Food Research Limited, Mt Albert Research Centre, Private Bag 92169, Victoria Street West, Auckland 1142, New Zealand.
2
Sensometrics & Consumer Science, Facultad de Química, Universidad de la República. By Pass de Rutas 8 y 101 s/n. CP 91000. Pando, Canelones, Uruguay.
* Corresponding author:
[email protected]
1
37 38
Research highlights
39 40
41 42
43 44
45 46
47 48 49
CATA questions are popular in product-focused consumer research on foods and beverages This research investigated CATA questions for sensory product characterisation by consumers Open-ended questions provided insights to consumers decisions to select, or not, a CATA term Evidence was obtained of consumers’ ability to accurately perform sensory characterisation tasks Increases in stimulus intensity were appropriately captured through higher frequency of term use
2
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
Abstract The use of check-all-that-apply (CATA) questions in product-focused consumer research on foods and beverages is now common, and the method is known to provide valid sensory product characterisations. Extensive methodological research has been conducted and has supported uptake, but understanding of how consumers complete CATA questions is incomplete, particularly with regard to their decision to select or not a term to describe the sensory properties of products. The present research was situated within this gap, and using open-ended questions participants (n=636) were asked to describe how they perceived a pair of samples with regard to an attribute and link this to CATA term selection. The results, obtained for taste (‘sweet’ and ‘sour/acidic’) and flavour (‘cinnamon’ and ‘smoky’) confirmed consumers’ ability to accurately perform sensory characterisation tasks. In particular, it was found that: i) the great majority of the consumers accurately used the CATA terms for describing the sensory characteristics they perceived in a sample, ii) when a term was not selected for describing samples, the majority of the consumers indicated that the corresponding sensory attribute was not perceived, iii) when a term was selected for describing only one of the samples in a pair, consumers reported to have perceived a difference in attribute intensity between the samples. Thus, CATA questions remain a desirable option for sensory product characterisation tasks with consumers, but should be selected with thought as they may not always be able to achieve desired sample discrimination due to the binary nature of the responses.
Keywords: CATA; sensory characterisation; consumer research, research methods
3
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
1. Introduction Over the last decade, one of the central dogmas of sensory evaluation has been successfully challenged, and it is now generally accepted that analytical tests can be conducted with consumers, who have been shown to be capable of providing accurate and reliable information (Ares & Varela, 2017b). The application of analytical tests with consumers require the use of appropriate methods that take into account the lack of training (Ares & Varela, 2017a). For this purpose, several methodologies have been developed in the last decade, of which check-all-that-apply (CATA) questions are the most popular (Ares & Varela, 2018). When using CATA questions, participants are presented with a list of terms that can be used to describe a sample, and their task is to select those that apply (Ares & Jaeger, 2015). The question format, which is easy to implement, has been extensively used in marketing research (Rasinski, Mingay, & Bradburn, 1994), and Adams, Williams, Lancaster, and Foley (2007) brought awareness of the potential application of CATA questions for sensory product characterisation with consumers. CATA questions were soon adopted, and uptake is likely to have been further boosted by extensive methodological research to identify pros/cons and develop guidelines for implementation (Ares & Jaeger, 2015; Ares & Varela, 2018). The simplicity of CATA questions is perhaps their greatest advantage, but the binary response format does not allow for a direct measurement of the intensity of the evaluated sensory attributes. This could hinder detailed sample descriptions and discrimination (Ares, Bruzzone, et al., 2014), and, indeed, if participants were to select all the attributes they perceive in a sample, regardless of their intensity, discrimination between samples would not occur. However, there appears to be a positive linear relationship between attribute intensity as rated by trained assessors and the percentage of consumers who select the corresponding CATA term (Ares, Antúnez, et al., 2015), and this underpinned a suggestion by Vidal, Ares, Hedderley, Meyners, and Jaeger (2018) that participants only select a term as applicable if the intensity of the attribute exceeds some kind of person-specific threshold. This notion has, however, not been empirically investigated, and a clear understanding is lacking of how participants decide to select or not select a term when using a CATA question for sensory product characterisation. Such knowledge regarding the decisions that underlie CATA term selection could enhance interpretation of the results obtained with CATA questions, and further inform suitability for specific applications. It was against this background that the primary aim of the present work was to obtain qualitative insights on how consumers use CATA questions in a sensory characterisation task, with a particular focus on decisions to select or not select a term relating to their perception of the sensory attribute in question.
2. Materials and Methods Four studies were conducted involving beverage samples systematically manipulated in specific sensory attributes (Table 1). Consumers were asked to taste the samples in sequential monadic order and to complete a regular CATA task. Afterwards, they were
4
118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161
presented with specific pairs of samples and were asked to describe the differences in terms of a specific attribute using an open-ended question. Take in Table 1.
2.1. Participants Four consumer studies were conducted with 139 to 209 consumers per study (33-54% female, 19-68 years old) (Appendix 1 has full details). Consumers from the Auckland area (New Zealand) who were registered on a database maintained by a professional recruitment agency invited to take part. At the time of recruitment participants confirmed willingness to consume products in the target categories (apple juice and ice tea). Participants attended research sessions at the Plant & Food Research (PFR) Product Insights Facility in Auckland. All studies were covered by a general approval for sensory and consumer research obtained from the PFR Human Research Ethics Committee. Written consent was obtained and participants received compensation in cash. 2.2. Samples The sample sets (beverages) were similarly constructed in each study: one control (C), and four systematically manipulated stimuli. In Studies 1 to 3, apple juice (AJ) was served as the product category, with the samples used in Studies 1 and 2 being identical. In two samples, sweetness was manipulated by addition to the control sample (C-AJ) of sucrose (20 g/L and 40 g/L) to create the S+ and S++ stimuli, respectively. The A+ sample was made by adding citric acid (CAS 5949-29-1) to the control sample (3 g/L), while Sample D was made by diluting the control sample with water (2:1 dilution). Dilution served to lessen the intensity of most sensory attributes, including ‘sour/acidic’ relative to Sample C-AJ. Study 3 differed by using samples where the differences between the control and the manipulated samples were larger. Thus, samples SS+ and SS++ were created by addition to the control sample (C-AJ) of sucrose (30 g/L and 60 g/L, respectively). Sample AA+ was made by adding citric acid (CAS 594929-1) to the control sample (5 g/L), while Sample D was identical to the one used in Study 2 (2:1 dilution of the control sample). In Study 4, a different beverage – iced tea (IT) – was used, and relative to the control sample (C-IT) which was made from base tea and water in a 1:1 ratio, cinnamon and smoky flavour was systematically manipulated. In two samples, a cinnamon flavour was introduced by addition of a cinnamon stock solution (22 g/L and 490 g/L) and water (478 g/L and 10 g/L) to the base tea to create the CN+ and CN++ stimuli, respectively. Similarly, by adding a smoky stock solution (76 g/L and 490 g/L) and water (424 g/L and 10 g/L) to the base tea the SM+ and SM++ samples, respectively, were created. Sample tastings by sensory and consumer researchers working at PFR (n=5) supported the samples being different in intensity of cinnamon and smoky flavour. Samples were prepared on the day of testing, with the exception of the cinnamon stock solution which was made in advance and kept under refrigeration for the duration of the study. Full details of samples and preparation are given in Appendix 2. The apple juice 5
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205
and iced tea used as base stimuli were commercially available and purchased in Auckland (NZ) supermarkets. Sample presentation was in 60 mL transparent and odour-free plastic cups containing 20 mL aliquots. Random 3-digit codes was used to identify samples, which were served at ~15°C. 2.3. Empirical procedures Data collection proceeded in two steps in all studies. Briefly explained, all participants within a study first completed a sensory characterisation task for each sample using CATA questions (Part 1). In the second step, pairs of samples were compared, and an open-ended question was used to explore the ways that consumers describe the perceived similarities or differences with regard to a specific sensory term (Part 2). Full details pertaining to Part 1 and Part 2 of data collection are given below. Consumers were seated in standard sensory testing booths during data collection (white lighting, positive air flow, 20-22 °C). Water and plain crackers were freely available for palate cleansing (encouraged, but not enforced). Full verbal instructions with supporting visuals of ballots were given in an adjoining briefing room immediately prior to study commencement. 2.3.1. Sensory product characterisation by CATA questions The five samples within a study were presented in sequential monadic order according to a design based on Williams Latin Squares. To complete the sensory product characterisation task (Part 1), consumers were asked to taste the sample and select all of the terms that applied. In Studies 1–3 (apple juice) the following 10 sensory terms were used: ‘bland’, ‘cooked apple’, ‘flavoursome’, ‘fresh apple’, ‘fruity’, ‘green apple’, ‘red apple’, ‘sour/acidic’, ‘strong’, and ‘sweet’. For Study 4 (iced tea), the sensory terms were: ‘chai/spices’, ‘cinnamon’, ‘fruity’, ‘mouth drying’, ‘smoky’, ‘strong’, ‘sweet’, ‘tea flavour’, ‘weak’, and ‘woody’. In all studies, sensory and consumer science researchers working at PFR (n=6) took part in pilot work to develop the terms (sample tastings and group discussion), and the use of composite terms (‘sour/acidic’) was based on Jaeger et al. (2019). The order of CATA terms varied across participants in accordance with Ares, Reis, et al. (2015). 2.3.2. Pairwise sample comparisons for selected sensory terms For Part 2 of data collection, participants were randomly assigned to one of two groups, each of which compared pairs of samples on different target sensory terms (Table 1). The task was similar in all studies, and for each pair of samples involved: (2a) a comparison of CATA responses for a specific sensory term, and (2b) an open-ended question to describe the ways in which they perceived the two samples as similar or different with regard to the sensory term. Using one of the pairs of samples from Study 1 as an example, the completed CATA ballots for samples C-AJ and S+ were presented again, and in (2a) consumers indicated whether they had selected the target sensory term (i.e., ‘sweet’) for one sample (C-AJ or S+), both of the samples (C-AJ and S+) or neither of the two 6
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249
samples (neither C-AJ or S+). Next, in (2b), participants were asked to describe, in their own words, the CATA responses for the two samples (C-AJ and S+) with regard to the target sensory term (i.e., ‘sweet’). If the same CATA response was given, a prompt was given to explain why they were evaluated in the same way. Conversely, if the two samples were perceived as different with regard to ‘sweet’, consumers were prompted to explain why. New samples were available to consumers on request when answering (2b), and were presented at the same time. Appendix 3 shows the ballot for (2a) and (2b) corresponding to the above example. Both parts — (2a) and (2b) — were completed for one pair of samples before the second pair of samples was presented. In the above example, the second pair of samples was C-AJ and S++ (Table 1), and across participants, the presentation order of sample pairs was counterbalanced. 2.4. Data analysis The data from each study were analysed separately, using the same set of procedures. The significance level was 5% and all statistical analyses were performed in the R language (R Core Team, 2017). The sensory product characterisation data obtained by CATA questions (Part 1) were analysed with Cochran’s Q test to determine sample effects for each sensory term, with pairwise comparisons (sign tests) being performed for significant terms. The open-ended responses (Part 2b) were analysed using content analysis (Krippendorff, 2004), following an inductive process where semantic categories were determined by the researchers as they read the raw data (Thomas, 2006). The coding was performed by one of the authors (MKB or KRL) and reviewed by another (GA or SRJ). For example, the semantic category response ‘cinnamon in manipulated sample only’ (Study 4) was used for: “Distinct cinnamon taste in 905 [CN++]”, “Sample 905 [CN++] was pleasant with a very subtle cinnamon taste, sample 792 [C-IT] did not give me a cinnamon taste at all” and “I felt like I could really taste the cinnamon flavour in 905 [CN++]. if I did not add/tick the cinnamon box for 792 [C-IT], the flavour wasn't strong enough.” Where necessary, multiple semantic categories were used, as illustrated by: “476 [SM++] was smoky and woody. It had a rustic smell and tasted, well, like it had been smoked. 792 [C-IT] wasn't either,” which was coded as ‘smoky in manipulated sample only’ and ‘smoky-like in manipulated sample only.’ The categories were not all mutually exclusive as would be the case with many closed question formats. The percentage of participants that mentioned each category were calculated separately for the different response options of the CATA question: target term was selected for none of the samples, target term was selected for both samples, target term selected for control sample only and target term selected for manipulated sample only.
3. Results 3.1. Studies 1 and 2 Studies 1 and 2 used the same set of apple juice samples manipulated in sweetness and sourness, and the results are presented together for the sake of parsimony. The 7
250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293
difference between the two studies was in Part 2 where the combination of target sensory terms and sample pairs changed (Table 1). 3.1.1. CATA term use in Studies 1–2 In Part 1 of data collection, sensory product characterisations were obtained using CATA questions. In both studies, significant differences among samples were found in the frequency of use of all 10 sensory terms (Full details can be found in Appendix 4 and 5). Specifically, for the target sensory terms ‘sweet’ and ‘sour/acidic’ the results were as follows: In both studies there was a significantly higher frequency of use of the term ‘sweet’ for the samples manipulated in sweetness (S+ and S++) relative to the control sample (C-AJ), and the frequency of use for ‘sweet’ was also significantly higher for S++ than S+ in both studies. Differences in the frequency of use of the term ‘sour/acidic’ were found between S++ and C-AJ in both studies, whereas S+ only significantly differed from C-AJ in Study 1. For Samples A+ and D, differences in the frequency of use of the term ‘sour/acidic’ were found between C-AJ and A+ in both studies, whereas D only significantly differed from C-AJ in Study 1. For the term ‘sweet’, significant differences in the frequency of use were found between C-AJ and D and A+ in both studies. Table 2 provides further information about CATA term use for the target sensory terms, with a focus on the pairwise sample comparison in Part 2 of data collection. Specifically, for the samples manipulated by the addition of sucrose (S+ and S++), about half of the participants in Study 1 selected the term ‘sweet’ for both the manipulated samples and the control sample, whereas approximately one third selected ‘sweet’ for the manipulated sample only (Table 2). In Study 2, most of the participants (>80%) did not select the term ‘sour/acidic’ for describing these three samples. For the comparison of the control sample and the sample manipulated by dilution (D), most of the participants in Study 1 (72%) did not select the term ‘sour/acidic’ for describing C-AJ and D, whereas 44% selected the term ‘sweet’ for describing the control sample only in Study 2. In the case of the comparison of C-AJ and A+, the term ‘sour/acidic’ was mainly selected in Study 1 for describing the latter sample only (48%). Take in Table 2. 3.1.2. Open-ended descriptions of sample pairs C-AJ vs. S+ and C-AJ vs. S++ in Studies 1–2 In Part 2 of data collection, open-ended descriptions of the sample pairs were obtained and coded to create semantic categories (SC) (see Appendix for full details). Retaining the focus on the target sensory terms and pairwise sample comparisons, Figures 1 and 2 show the frequency of SC use according to how participants used the target sensory term when completing the CATA question in Part 1 of sample evaluation. Figure 1 shows the results for two sample pairs: C-AJ vs. S+ and C-AJ vs. S++. When these were compared in Study 1 with respect to the sensory term ‘sweet’, similar results were obtained. It can be seen in Figure 1a and 1b that the majority of the participants who selected the term ‘sweet’ for describing the manipulated sample only (S+ or S++) indicated that they perceived a difference in sweetness between the samples (SC2, 68% and 73%, 8
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337
respectively). For both sample pairs it was also seen that approximately 30% of the participants characterised both samples as ‘sweet’ (SC1). When the responses of participants who selected the term ‘sweet’ for describing both samples are taken into account, the frequency of mention of the category ‘both sweet’ reaches approximately 80% (SC1). However, it is worth noting that 49-56% of the participants who used the term ‘sweet’ for describing both samples stated that they perceived differences in sweetness (SC2), whereas 32-45% stated that they did not perceived differences in sweetness (SC5). When these samples were compared in sourness in Study 2, most of the participants did not select the term ‘sour/acidic’ to describe the samples and 55–58% stated that none of the samples were sour (Figure 1c, SC1). Instead, they referred to other differences (SC2), including sweetness (SC3) and intensity (SC4). Take in Figure 1. 3.1.3. Open-ended descriptions of sample pairs C-AJ vs. D and C-AJ vs. A+ in Studies 1–2 For the sample pair C-AJ vs. D in Study 1, the majority of the participants did not select the term ‘sour/acidic’ to describe these samples. According to Figure 2a, 59% of participants mentioned that none of the samples were sour (SC2), and 20% said that neither sample was intense (SC6). Similarly, when the term ‘sweet’ was not selected to describe these two samples in Study 2, participants indicated that they were not sweet (36%, SC3) or that neither of them was intense (36%, SC2) (Figure 2b). On the contrary, those participants that selected the term ‘sweet’ only for describing C-AJ indicated that they perceived a difference in sweetness (80%, SC4). With regards to the sample pair C-AJ vs. A+, Figure 2c shows the frequency of use of five semantic categories according to how participants in Study 1 used the term ‘sour/acidic’ to describe the two samples. The majority of participants who did not select ‘sour/acidic’ to describe the samples stated that none of them was sour (50%, SC2) or that they perceived differences that were not specific to sourness (67%, SC1). On the contrary, participants who selected the term ‘sour/acidic’ for describing both samples stated that they were both sour (47%, SC5). In this case, 80% of the participants stated that they perceived differences in sourness (SC3), even though they selected the term for describing both samples. Finally, the great majority of participants who selected the term ‘sour/acidic’ for describing the manipulated sample only (A+) stated that they perceived differences in sourness (81%, SC3). Take in Figure 2. 3.2. Study 3 3.2.1. CATA term use in Study 3 Samples of apple juice were also used in Study 3, but relative to Studies 1–2, the differences between the control and manipulated samples were larger (Table 1). The five samples were C-AJ, SS+, SS++, D, AA+, and when product characterisations were obtained using CATA questions, significant differences were established for every term (p<0.001). 9
338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381
The full details are given in the Appendix, with focus below directed to the two target sensory terms: ‘sweet’ and ‘sour/acidic.’ For the former, the sample with added sucrose were more frequently characterised as ‘sweet’ than the control samples, although they were similarly associated with high frequency of use for sweet (87% and 93% for SS+ and SS++, respectively). Sweetness (or lack hereof) was similarly perceived in D and AA+ (12% and 14%, respectively). Fitting expectations, the sample with added citric acid was frequently associated with ‘sour/acidic’ (88%), and comparatively all other samples were not very ‘sour/acidic’ (≤15%), although SS+ and SS++ were least ‘sour/acidic’ (1-2%). Table 2 provides summary information about the frequency of use of the target sensory terms for the selected comparisons between pairs of samples. For the pairs C-AJ vs. SS+ and C-AJ vs. SS++, while 40-42% of the participants selected the term ‘sweet’ for both samples, a slightly larger proportion of participants (44-53%) only selected this term for the manipulated sample. On the contrary, few participants selected ‘sweet’ to describe only the control sample. In terms of ‘sour/acidic’, 79% of participants used ‘sour/acidic’ for neither the control sample nor the diluted sample (D). Regarding the sample manipulated by the addition of citric acid (AA+), 75% of the participants selected ‘sour/acidic’ to describe the manipulated sample only, whereas none of the participants used this term to only describe the control sample. 3.2.2. Open-ended descriptions of sample pairs in Study 3 The full frequency table for semantic category use (Part 2 of data collection) is shown in Appendix 9, while below the results with most relevance for understanding the pairwise sample comparisons are presented. There was considerable similarity in results for pairs CAJ vs. SS+ (Figure 3a) and C-AJ vs. SS++ (Figure 3b), which showed that the majority of participants who selected ‘sweet’ for describing the manipulated sample only indicated that they perceived a difference in sweetness (73-89%, SC2) or sweetness intensity (76-88%, SC3). Among participants who selected ‘sweet’ for both samples, the majority described these as sweet (66% and 89%, SC1), with 59% stating that differences in sweetness were not perceived (SC4) and 36-37% stating that differences in intensity were not perceived (SC5). Regarding the comparison of C-AJ vs. AA+, most participants only used the term ‘sour/acidity’ to describe the manipulated sample (75%) (Table 2). In the open-ended comments (Figure 3c), these participants stated that they perceived differences in sourness (SC8) or differences in sourness intensity (SC1), in addition to differences in other sensory characteristics (SC3). On the contrary, for the comparison of C-AJ vs. D, 79% of the participants did not select ‘sour/acidic’ to describe either sample (Table 2). In this case, the great majority of those participants stated that they did not perceive differences in sourness (SC9), and stated that neither samples was sour (SC5). However, they described differences between the samples in other sensory characteristics not related with sourness (SC3, SC4), in agreement with results from the CATA questions completed in Part 1 of sample evaluation (see Appendix). Take in Figure 3.
10
382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425
3.3. Study 4 3.3.1. CATA term use in Study 4 Samples of iced tea were used in Study 4 (Table 1), which relative to the control sample C-IT, were manipulated in two sensory characteristics, cinnamon flavour (CN+ and CN++) and smoky flavour (SM+ and SM++). Significant differences among the five samples were found for 8 of the 10 sensory CATA terms in Part 1 of data collection (see Appendix for full details including sign test results). Of specific relevance to the two target sensory terms, the frequency of use of the term ‘cinnamon’ was significantly different between the control sample and samples with added cinnamon flavour, and between the two cinnamon samples. A similar pattern of results was found for the samples manipulated in smokiness. That is, the manipulated samples significantly differed from the control sample, and each other. Table 2 shows that the use of the target sensory terms (‘cinnamon’ and ‘smoky’) fitted expectations, such that for the pairwise comparisons between the control sample the samples with the most flavour addition (CN++ and SM++), the majority of participants selected the target sensory terms for describing the manipulated sample only (76% and 66%, respectively), and only a very small minority (2%) selected the target terms for describing only the control sample. For the comparisons between the control sample and the samples with the least flavour addition (CN+ and SM+), the majority of participants (63% and 56%, respectively) did not select the target sensory terms for describing any of the samples, whereas approximately a quarter of the participants selected the target terms only for describing the manipulated sample (28% and 26%, respectively). 3.3.2. Open-ended descriptions of sample pairs in Study 4 The frequency of use of all the semantic categories identified in the open-ended question (Part 2 of data collection) is shown in Appendix 11. Results with the most relevance for the aim of the study are presented below. For the sample pair C-IT vs. CN+, 45% of the participants who did not select the term ‘cinnamon’ for describing the samples stated that none of the samples had cinnamon flavour (SC5) (Figure 4a). However, 45% stated that only the manipulated sample had cinnamon flavour (SC1). The responses of participants who selected the term ‘cinnamon’ for describing the manipulated sample only for the pair C-IT vs. CN++ provided the same insights (Figure 4b). In the case of the pair C-IT vs. SM+, 30% of the participants who did not select the term ‘smoky’ for describing the samples indicated that they did not perceive this flavour (SC2), whereas 41% indicated that they perceived smoky flavour in the manipulated sample only (Figure 4c). This last percentage rose to 80% for participants who selected the term ‘smoky’ for describing the manipulated sample only. Similarly, when the pair C-IT vs. SM++ was considered, the majority of the participants indicated that they perceived differences between the two samples in smoky flavour or in smoky-like flavour (Figure 4d: SC1, SC2, SC4). Take in Figure 4. 11
426 427 428 429 430 431 432 433 434 435 436 437 438 439
4. Discussion The present research primarily sought to better understand how consumers use CATA questions in a sensory characterisation task, in particular, to qualitatively uncover when and why terms relating to perception of a specific sensory attribute are / are not selected. Secondly, it provided further evidence of consumers’ ability to accurately perform sensory product characterisation tasks using CATA questions. In Studies 1–4, the identified differences among samples were in agreement with the differences in the sensory characteristics introduced by manipulation (Table 2). These results are in agreement with past conclusions regarding the validity of CATA questions based on the comparison with results from descriptive analysis with trained assessors (Ares, Antúnez, et al., 2015; Ares, Barreiro, Deliza, Giménez, & Gámbaro, 2010; Bruzzone, Ares, & Giménez, 2012; Dooley, Lee, & Meullenet, 2010).
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468
The responses to the open-ended question (Part 2b of data collection in Studies 1– 4) supported this conclusion by showing (Figure 1–4) that: i) the great majority of participants accurately used the CATA terms for describing the sensory characteristics they perceived in a sample, ii) when a term was not selected for describing a pair of samples, the majority of participants indicated that the corresponding sensory attribute was not noticeable, and iii) when a term was selected for describing only one of the samples, participants reported to have perceived a difference in attribute intensity. Only in very few instances did participants use the CATA question in the opposite direction to what they reported perceiving when comparing two specific samples. Overall, this result fits with the fact that satisficing response strategies are not prevalent when consumers perform sensory characterisation tasks using CATA questions. In this sense, Ares, Etchemendy, et al. (2014) reported that consumers tend to perform a detailed inspection of a CATA question before checking the terms that apply for describing a sample. In direct support of Vidal et al. (2018) who suggested that a subject-specific threshold exists for selecting a CATA term to describe a sample, participants sometimes did not select a CATA term even though it was clear from their open-ended descriptions that they perceived the sensory attribute. Using ‘sour/acidic’ as an exemplar, a respondent in Study 1 who had not selected ‘sour/acidic’ for the C-AJ and A+ sample when completing the CATA task, provided the following response for the C-AJ vs. A+ pair of samples: “When comparing 476 [C-AJ] and 259 [A+], 259 [A+] is the strongest with having a sour element. I can't taste any sour in 476 [C-AJ]. 259 [A+] tastes like a sweet fresh apple. 476 [C-AJ] - not a fan, not sour.” Also in Study 1, for the same pairwise comparison, another respondent who did not use ‘sour/acidic’ to describe these samples wrote: “I found none of the samples overly sour or acidic. 259 [A+] I marked as 'green apple' as it was slightly more acidic. 476 [C-AJ] I almost marked as green but not so acidic, neither were strongly acidic.” In Study 3, a respondent who did not select ‘sour/acidic’ for the C-AJ and D samples in the CATA task later wrote: “Neither tasted sour/acidic - both fairly bland in taste and aftertaste. 476 [C-AJ] reminded me more of green apples, was slightly more sour, however I wouldn't describe it as 'sour'. 924 [D] tasted a bit like a watered down 476 [C-AJ], it was not sour at all, reminding me of red apples.” The fact that consumers do not select all
12
469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512
the terms they perceive, but only those that are more intense than an internal threshold is why CATA questions are able to discriminate among samples that differ in attribute intensity. However, the existence of such an internal threshold also indicates that when attribute intensities are low, intensity differences between samples are unlikely to be uncovered with CATA questions even if perceived by consumers. In such situations, methodologies that allow for intensity ratings are needed, which with consumers could include intensity scales or rate-all-that-apply (RATA) questions (Ares, Bruzzone, et al., 2014). Further research to understand these internal thresholds and how they vary across people, sensory attributes and product categories would be of interest. Factors that influence the proposed internal thresholds include ability to detect the sensory attributes, and traditional detection and recognition threshold testing could be performed. Are internal thresholds for selecting a sensory attribute of interest similar to either of these thresholds, or dependent on additional influences? In this sense, future research should look at contextual effects given by the characteristics of the sample set, as well as order effects, in the selection of sensory terms in a CATA question. The comparison of sample pairs also confirmed the limitation of CATA questions with regard to discrimination of samples that differ in the intensity of attributes that are easily perceived. The open-ended responses illustrated how respondents would select the CATA term for describing both samples despite perceiving an intensity difference, for example, two respondents in Study 1 who selected ‘sweet’ for both C-AJ and S+ or S++ in the CATA clearly perceived different levels of sweetness and wrote, respectively: “Both were varying sweetness, 792 [S+] was sweeter” and “175 [S++] was definitely sweeter but sickly in sweetness. 476 [C-AJ] is sweet but more mild.” Another example from Study 1 was for ‘sour/acidic’ which was selected in the CATA task for both C-AJ and A+ samples, but described by a participant in the open-ended task as different: “259 [A+] was overly sour with no apple flavour. 476 [C-AJ] had right amount of sour but the overall flavour could have been stronger.” This type of responding also hinders the ability of CATA questions to discriminate among samples and fits with Vidal et al. (2018) who reported that RATA tended to be more discriminative than CATA for terms that described sensory attributes which were applicable to most of the samples in a given study. In other words, if samples of juice are all sweet, but differently so, then RATA questions will be more likely to identify these differences than CATA questions. However, if some samples are cinnamon-flavoured and others are smoky, then CATA is expected to discriminate just as well as RATA. To achieve greater discrimination without the use of RATA, traditional discrimination testing could be used, for example, two-sample directed difference tests (2AFC; Lawless & Heymann, 2010) which would require focus on a single attribute and the task for consumers would be to identify the sample within a pair that was more sweet, say. While entirely feasible and 2AFC tests have previously been used with consumers (e.g., Kremer, Shimojo, Holthuysen, Köster, & Mojet, 2013; Perry, Byrnes, Heymann, & Hayes, 2019), it is cumbersome due to the repeated pairwise tests required to cover studies with more than two samples and many sensory attributes. A further suggestion for future research would be to use the semantic codes developed in this research as basis for a series of close questions that respondents answer when performing the pairwise sample comparisons. This would reduce ambiguity and, 13
513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550
perhaps, in conjunction with the threshold testing suggested earlier, led to more conclusive insights regarding that it means to CATA?
5. Conclusions The present research has contributed toward answering the question “what does it mean to check-all-that-apply?” Using beverages as case studies and working with samples with smaller and larger differences, as well as variations in taste attributes (‘sweet’ and ‘sour/acidic’) and flavour attributes (‘cinnamon’ and ‘smoky’) results from 636 consumers confirmed ability to appropriately perform sensory characterisation tasks. Through comparisons of pairs of samples that differed in intensity of attributes, the majority of consumers were found to accurately use the CATA terms for describing the sensory characteristics they perceived in a sample. When a term was not selected for describing samples, the majority of the consumers indicated that the corresponding sensory attribute was not perceived, and, finally, when a term was selected for describing one of the samples in a pair only, consumers reported to have perceived a difference in attribute intensity between the samples. The notion of a person-specific threshold for selecting, or not, a CATA term was supported, and contributed to understanding why CATA questions are able to discriminate between samples on attributes of interest. However, the binary response format also hinders discrimination between samples when intensities are too low and/or too high. In such instances, RATA questions offer advantages.
Acknowledgements Financial support was received from two sources: i) The New Zealand Ministry for Business, Innovation & Employment, and ii) the New Zealand Institute for Plant and Food Research Ltd. Staff at the Sensory & Consumer Science Team at Plant and Food Research are thanked for help in pilot work and data collection.
Author contributions GA and SRJ planned the research and wrote the paper. Statistical analysis was by GA. All other authors contributed to data collections, and MKB and KL also contributed to semantic coding.
Declaration of conflicts All authors declare no conflicts of interest.
14
551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570
Appendix (Parts 1 to 11) 1. Summary of participant characteristics in each of Studies 1 to 4. 2. Sample details for Studies 1 to 4. 3. Exemplar ballot used in Part 2 of data collection for pairwise sample comparisons. 4. Frequency table for samples and sensory CATA terms in Part 1 of data collection for Study 1, with sign test results. 5. Frequency table for samples and sensory CATA terms in Part 1 of data collection for Study 2, with sign test results. 6. Frequency of use for semantic categories in Part 2 of data collection for Study 1 pertaining to four pairwise sample comparisons. 7. Frequency of use for semantic categories in Part 2 of data collection for Study 2 pertaining to four pairwise sample comparisons. 8. Frequency table for samples and sensory CATA terms in Part 1 of data collection for Study 3, with sign test results. 9. Frequency of use for semantic categories in Part 2 of data collection for Study 3 pertaining to four pairwise sample comparisons. 10. Frequency table for samples and sensory CATA terms in Part 1 of data collection for Study 4, with sign test results. 11. Frequency of use for semantic categories in Part 2 of data collection for Study 4 pertaining to four pairwise sample comparisons.
15
571 572 573
References Adams, J., Williams, A., Lancaster, B., & Foley, M. (2007). Advantages and uses of check-
574
all-that-apply responses compared to traditional scaling of attributes for salty snacks.
575
Paper presented at the 7th Pangborn Sensory Science Symposium, 12-16 August,
576
2007, Minneapolis, USA.
577
Ares, G., Antúnez, L., Bruzzone, F., Vidal, L., Giménez, A., Pineau, B., . . . Jaeger, S. R.
578
(2015). Comparison of sensory product profiles generated by trained assessors and
579
consumers using CATA questions: Four case studies with complex and/or similar
580
samples. Food Quality and Preference, 45, 75-86. doi:
581
http://dx.doi.org/10.1016/j.foodqual.2015.05.007
582
Ares, G., Barreiro, C., Deliza, R., Giménez, A., & Gámbaro, A. (2010). Application of a
583
check-all-that-apply question to the development of chocolate milk desserts. Journal
584
of Sensory Studies, 25, 67-86. doi: 10.1111/j.1745-459X.2010.00290.x
585
Ares, G., Bruzzone, F., Vidal, L., Cadena, R. S., Gimenez, A., Pineau, B., . . . Jaeger, S.
586
R. (2014). Evaluation of a rating-based variant of check-all-that-apply questions:
587
Rate-all-that-apply (RATA). [Peer Reviewed - Journal Article International 2014/09].
588
Food Quality and Preference, 36, 87-95. doi: 10.1016/j.foodqual.2014.03.006
589
Ares, G., Etchemendy, E., Antunez, L., Vidal, L., Gimenez, A., & Jaeger, S. R. (2014).
590
Visual attention by consumers to check-all-that-apply questions: Insights to support
591
methodological development. [Peer Reviewed - Journal Article International
592
2013/12]. Food Quality and Preference, 32, 210-220. doi:
593
10.1016/j.foodqual.2013.10.006
594
Ares, G., & Jaeger, S. R. (2015). Check-all-that-apply (CATA) questions with consumers in
595
practice: experimental considerations and impact on outcome. In J. Delarue, J. B.
596
Lawlor & M. Rogeaux (Eds.), Rapid Sensory Profiling Techniques (pp. 227-245).
597
Cambridge, UK: Woodhead Publishing.
598
Ares, G., Reis, F., Oliveira, D., Antúnez, L., Vidal, L., Giménez, A., . . . Jaeger, S. R.
599
(2015). Recommendations for use of balanced presentation order of terms in CATA
600
questions. Food Quality and Preference, 46, 137-141. doi:
601
http://dx.doi.org/10.1016/j.foodqual.2015.07.012
602 603
Ares, G., & Varela, P. (2017a). Authors’ reply to commentaries on Ares and Varela. Food Quality and Preference, 61, 100-102.
16
604
Ares, G., & Varela, P. (2017b). Trained vs. consumer panels for analytical testing: Fueling
605
a long lasting debate in the field. Food Quality and Preference, 61, 79-86. doi:
606
https://doi.org/10.1016/j.foodqual.2016.10.006
607
Ares, G., & Varela, P. (2018). Consumer-Based Methodologies for Sensory
608
Characterization. In G. Ares & P. Varela (Eds.), Methods in Consumer Research.
609
New approaches to classical methods (Vol. 1, pp. 187-209). Cambridge, UK:
610
Woodhead Publishing.
611
Bruzzone, F., Ares, G., & Giménez, A. N. A. (2012). Consumers' texture perception of milk
612
desserts. II – comparison with trained assessors' data. Journal of Texture Studies,
613
43(3), 214-226. doi: 10.1111/j.1745-4603.2011.00332.x
614
Dooley, L., Lee, Y.-s., & Meullenet, J.-F. (2010). The application of check-all-that-apply
615
(CATA) consumer profiling to preference mapping of vanilla ice cream and its
616
comparison to classical external preference mapping. Food Quality and Preference,
617
21(4), 394-401. doi: http://dx.doi.org/10.1016/j.foodqual.2009.10.002
618
Jaeger, S. R., Hunter, D. C., Vidal, L., Chheang, S. L., Ares, G., & Harker, F. R. (2019).
619
Sensory product characterization by consumers using check-all-that-apply
620
questions: Investigations linked to term development using kiwifruit as a case study.
621
Journal of Sensory Studies, e12490.
622
Kremer, S., Shimojo, R., Holthuysen, N., Köster, E., & Mojet, J. (2013). Consumer
623
acceptance of salt-reduced “soy sauce” foods over rapidly repeated exposure. Food
624
quality and preference, 27(2), 179-190.
625 626 627 628 629
Krippendorff, K. (2004). Content Analysis: An Introduction to its Methodology. Thousand Oaks, CA: Sage Publications. Lawless, H. T., & Heymann, H. (2010). Sensory Evaluation of Food: Principles and Practices (2nd ed.). New York: Springer. Perry, D. M., Byrnes, N. K., Heymann, H., & Hayes, J. E. (2019). Rejection of labrusca-
630
type aromas in wine differs by wine expertise and geographic region. Food Quality
631
and Preference, 74, 147-154.
632
R Core Team. (2017). R: A language and environment for statistical computing (Vol. R
633
language version 3.4.3 ). Vienna, Austria: R Foundation for Statistical Computing.
634
https://www.R-project.org/.
635
Rasinski, K. A., Mingay, D., & Bradburn, N. M. (1994). Do respondents really ‘Mark All
636
That Apply’ on self-administered questions? . Public Opinion Quarterly, 58, 400–
637
408. 17
638 639 640
Thomas, D. R. (2006). A general inductive approach for analyzing qualitative evaluation data. American Journal of Evaluation, 27(2), 237-246. Vidal, L., Ares, G., Hedderley, D. I., Meyners, M., & Jaeger, S. R. (2018). Comparison of
641
rate-all-that-apply (RATA) and check-all-that-apply (CATA) questions across seven
642
consumer studies. Food Quality and Preference, 67, 49-58. doi:
643
https://doi.org/10.1016/j.foodqual.2016.12.013
18
645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676
Figure Legends
Figure 1. Results for Part 2 of data collection for Studies 1 and 2 (apple juice with smaller differences) showing semantic category (SC) frequency of use as percentages. (A) Sample pair C-AJ vs. S+ with ‘sweet’ as the target sensory term (Study 1), (B) Sample pair C-AJ vs. S++ with ‘sweet’ as the target sensory term (Study 1), and (C) Sample pairs C-AJ vs. S+ and C-AJ vs. S++ when the target sensory term ‘sour/acidic’ was selected for none of the samples (Study 2).
Figure 2. Results for Part 2 of data collection for Studies 1 and 2 (apple juice with smaller differences) showing semantic category (SC) frequency of use as percentages. (A) Sample pair C-AJ vs. D with target sensory term ‘sour/acidic’ selected for none of the samples (Study 1), (B) Sample pair C-AJ vs. D with ‘sweet’ as the target sensory term (Study 2), (C) Sample pair C-AJ vs. A+ when the target sensory term ‘sour/acidic’ was selected for both samples, the manipulated sample only or none of the samples (Study 1).
Figure 3. Results for Part 2 of data collection for Study 3 (apple juice with larger differences) showing semantic category (SC) frequency of use as percentages. (A) Sample pair C-AJ vs. SS+ with ‘sweet’ as the target sensory term, (B) Sample pair C-AJ vs. SS++ with ‘sweet’ as the target sensory term, and (C) Sample pair C-AJ vs. D when the target sensory term ‘sour/acidic’ was selected for neither sample and sample pair C-AJ vs. AA+ when the target sensory term ‘sour/acidic’ was selected for the manipulated sample only.
Figure 4. Results for Part 2 of data collection for Study 4 (ice tea) showing semantic category (SC) frequency of use as percentages. (A) Sample pair C-IT vs. CN+ with ‘cinnamon’ as the target sensory term, (B) Sample pair C-IT vs. CN++ with ‘cinnamon’ as the target sensory term, (C) Sample pair C-IT vs. SM+ with ‘smoky’ as the target sensory term, and (D) Sample pair C-IT vs. SM++ with ‘smoky’ as the target sensory term.
19
677 678 679 680 681
Table 1. Overview of the four consumer studies (n=636 participants in total) with details about samples and between-subjects conditions linked to Part 2 of data collection#. Refer to table notes and Table 2 for full details of sample manipulations. The number of consumers (n) who took part is indicated between brackets next to the experimental condition in the third column. Each pairwise comparison was focused on a single sensory term (selected among those included in the CATA question in Part 1 of data collection) as indicated in the fourth column. Study
Samples*
1
Apple juice (smaller differences) C-AJ, S+, S++, A+, D
Between-subjects condition, showing pairwise sample comparisons** Sample pairs (n=85): C-AJ vs. S+ and C-AJ vs. S++
Apple juice (smaller differences) C-AJ, S+, S++, A+, D
Sample pairs (n=64): C-AJ vs. A+ and C-AJ vs. D Sample pairs (n=71): C-AJ vs. S+ and C-AJ vs. S++
Apple juice (larger differences) C-AJ, SS+, SS++, AA+, D
Sample pairs (n=68): C-AJ vs. A+ and C-AJ vs. D Sample pairs (n=105): C-AJ vs. SS+ and C-AJ vs. SS++
Iced tea C-IT, CN+, CN++, SM+, SM++
Sample pairs (n=104): C-AJ vs. AA+ and C-AJ vs. D Sample pairs (n=67): C-IT vs. CN+ and C-IT vs. CN++
2
3
4
682 683 684 685 686
Sensory term for pairwise comparison*** ‘sweet’
‘sour/acidic’ ‘sour/acidic’
‘sweet’ ‘sweet’
‘sour/acidic’ ‘cinnamon’
Sample pairs (n=72): C-IT vs SM+ and C-IT vs SM++ ‘smoky’ # Notes. ) In all studies, participants completed a sensory characterisation task of each sample using CATA questions as the first step. This was regardless of the experimental manipulation they were assigned to in the second step of data collection. *) Sample codes as follows: Control sample, in apple juice (C-AJ) and iced-tea (C-IT). In Studies 1 to 2, samples with manipulated sweetness level (less and more) were S+ and S++; while samples with manipulated acidity level was A+, and the diluted sample was D. In Study 3, the sample manipulations were similar to Studies 1-2, but designed to achieve larger sample differences. This is indicated by the use of double capital letters as opposed to single capital letters 20
687 688 689 690
(SS vs. S, for example). In Study 4, samples with manipulated cinnamon level (less and more) were CN+ and CN++; while sample with manipulated smokiness (less and more) were SM+ and SM++. **) Within studies and experimental conditions, the two pairs of samples were presented in counter balanced order across participants. ***) For each pair of samples there was two parts to the comparison, beginning with a comparison of CATA responses (2a), followed by an open-ended description of perceived differences between the two samples (2b).
21
691 692 693 694 695
Table 2. Results for Studies 1–4 showing CATA question term use (Part 1 of data collection), selected for the Part 2 target sensory terms and pairwise sample comparisons. The percentages in the last four columns show frequency of use of target sensory term for none, both or one sample within the sample pairs. Study
Target sensory term
Pair of samples
1
Sweet Sour/acidic
2
Sour/acidic Sweet
3
Sweet Sour/acidic
4
Cinnamon Smoky
Selected for none of the samples 12%
Selected for both samples
Selected for control sample only
Selected for manipulated sample only
C-AJ vs. S+
Frequency of use of the target term (%) # 53 vs. 76 *
48%
11%
29%
C-AJ vs. S++
53 vs. 89 *
5%
55%
5%
35%
C-AJ vs. D
23 vs. 15 *
72%
8%
16%
5%
C-AJ vs. A+
23 vs. 75 *
28%
23%
0%
48%
C-AJ vs. S+
11 vs. 9 ns
83%
4%
9%
4%
C-AJ vs. S++
11 vs. 0 *
91%
0%
9%
0%
C-AJ vs. D
53 vs. 31 *
41%
11%
44%
4%
C-AJ vs. A+
53 vs. 12 *
26%
21%
37%
16%
C-AJ vs. SS+
46 vs. 87 *
12%
40%
5%
44%
C-AJ vs. SS++
46 vs. 93 *
4%
42%
1%
53%
C-AJ vs. D
15 vs.
11 ns
79%
5%
9%
8%
C-AJ vs. AA+
15 vs. 88 *
14%
12%
0%
75%
C-IT vs. CN+
8 vs. 32 *
63%
5%
4%
28%
C-IT vs. CN++
8 vs. 81 *
15%
7%
2%
76%
C-IT vs. SM+
17 vs. 31 *
56%
6%
12%
26%
C-IT vs. SM++
17 vs. 79 *
16%
16%
2%
66%
696 697 698
Note. #) * is used to indicate that frequency of term use is significantly different at 5% based on the sign test, while ns indicate a non-significant difference.
699
APPENDIX
700 701
To accompany paper titled: “What does it mean to check-all-that-apply? Four case studies with beverages” by Jaeger et al. published in Food Quality and Preference.
702 703 704
Part 1. Summary of participant characteristics for Studies 1 to 4, with responses shown as percentages. The number of consumers in each study (N) is shown between brackets. Study 1 (N=149)
Study 2 (N=139)
Study 3 (N=209)
Study 4 (N=139)
Gender Male Female
50 50
67 33
46 54
49 51
Age 19-30 years old 31-45 years old
31 23
26 29
29 27
24 30
22
46-60 years old >60 years old
33 13
32 13
32 12
35 11
74
70
69
73
14
13
12
9
3 24
11 22
4 26
10 22
Number of people in household 1-2 3-4 5+
35 44 21
32 47 21
32 51 17
34 50 46
Household composition*,*** No-one, I live alone Spouse/partner Children3 Others4
12 59 52 36
7 61 56 42
9 54 47 39
10 67 58 30
Yearly household income (NZ$) <50,000 50,000 - 119,999 >120,000 Prefer not to say
8 44 34 14
10 42 35 13
16 41 33 10
7 46 40 7
67
76
74
50
33
24
26
50
7.4 ± 1.3
7.3 ± 1.2
7.4 ± 1.3
6.2 ± 2.1
Ethnicity*,** New Zealand European Maori and Pacific Islander Caucasians1 Others2
Consumption frequency# Once a month and more Less than once a month Liking for product category## 705 706 707 708 709 710 711 712 713
*) Responses from these demographic questions do not add up to 100% because consumers may tick more than one option. **) 1Caucasians included Australian, European and North American (Canada and USA). 2Others included Chinese, Indian, Japanese and other. ***) 3Children included children aged above or below 18 of household members. 4Others included parents, flatmates and others. #) Stated consumption frequency, for apple juice (Studies 1-3) or iced tea (Study 4). ##) Average stated liking (1= “dislike extremely”, 9= “like extremely”) and standard deviation, for apple juice (Studies 1-3) or iced tea (Study 4).
23
714
24
Part 2. Overview of samples used in each of the four consumer studies. The same set of samples was used in Studies 1 and 2, but with different experimental condition as per Table 1 (main paper). See below for specific details pertaining to each sample.
Study 1 and 2 1 and 2 1 and 2 1 and 2 1 and 2 3 3 3 3 3 4 4
Sample C-AJ S+ S++ A+ D C-AJ SS+ SS++ AA+ D C-IT CN+
4
CN++
4 4
SM+ SM++
Sample details Control Control + sucrose (20 g/L) Control + sucrose (40 g/L) Control + citric acid (3 g/L) Control + water (2:1) Control Control + sucrose (30 g/L) Control + sucrose (60 g/L) Control + citric acid (5 g/L) Control + water (2:1) Base tea + water (500 g/L) Base tea + water (478 g/L) + cinnamon stock solution (22 g/L) Base tea + water (10 g/L) + cinnamon stock solution (490 g/L) Base tea + water (424 g/L) + smoky stock solution (76 g/L) Base tea + water (10 g/L) + smoky stock solution (490 g/L)
Apple juice samples used in Studies 1-3
Studies 1 and 2: Control sample (C-AJ) was made from 3:1 parts apple juice (Kerri Juice Co, Apple Juice) to filtered water + 0.25 g/L citric acid + 0.02 g/L yellow food colour. Citric acid was already an added ingredient in the commercial apple juice and, thus chosen for further acidity manipulations. Colour was added to counter the effect of dilution, so that samples would not look “thin.” In Study 3, a slight change was made to the control sample, in that less yellow food colour was added; reduced to 3×10-5 g/L yellow food colour, with the exception of the AA- sample where a combination of red and yellow food colours were used to achieve a similar colour to the other samples in Study 3 (yellow: 3×10-6 g/L and red: 6×10-5 g/L). Food grace citric acid and food colours used in sample manipulations were purchased from Queen New Zealand Pty Ltd. Sucrose was granulated sugar from Chelsea, New Zealand.
Iced tea samples used in Study 4
Base tea was made from black and green tea, mixed in a 1:1 ratio, to which sucrose was added (10 g/L; granulated sugar from Chelsea, New Zealand). The black tea (Dilmah Premium Black Tea) was made by steeping 4 black tea bags in 1L boiling water for 3 minutes, while stirring. The green tea
25
(Healtheries Green Tea) was similarly made, but used 8 tea bags per litre of water. The cinnamon stock solution was made by steeping pieces of cinnamon quills (11.7 g/L) in boiling water for 3 minutes while stirring. Then the steeped liquid was passed through a household sieve. The smoky stock solution was made by adding 113 μL liquid smoke concentrate (NZ Manuka Egg Company; www.nzmanukaeggs.co.nz) to 1L of water.
26
Part 3. Exemplar ballot used in Part 2 of data collection for pairwise sample comparisons.
Please look at your two completed questionnaires. 1. For which sample/s did you select the word “Sweet”
□ 476
□ 792
□ Neither
2. In your own words, please describe your answers for 476 and 792 with respect to the word “Sweet”.
□
If identical, why?
If different, why?
If you wish to taste the samples again, please push your empty tray back through the hatch
27
Part 4. Frequency table for Part 1 (CATA question) of Study 1 (apple juice with smaller differences), showing frequency of use (%) of the 10 sensory terms for each sample (C-AJ, S+, S++, D, A+). The control sample is C-AJ. The results of sign tests are shown (within rows, same letters are not significant different at 5%). Refer to Table 1 and Appendix 2 for full sample details. Sensory term Bland Cooked apple Flavoursome Fresh apple Fruity Green apple Red apple Sour/acidic Strong Sweet
C-AJ 35b 25b 40b 51c 53c 32b 38b 23c 9b 53c
S+ 14a 29b 64c 46c 74d 18a 50c 11b 25c 76d
S++ 7a 32b 65c 47c 71d 17a 52c 4a 32c 89e
D 88c 27b 7a 21a 15a 24a,b 22a 15b 3a 16a
A+ 11a 13a 43b 32b 38b 64c 15a 75d 55d 28b
Part 5. Frequency table for Part 1 (CATA question) of Study 2 (apple juice with smaller differences), showing frequency of use (%) of the 10 sensory terms for each sample (C-AJ, S+, S++, D, A+). The control sample is C-AJ. The results of sign tests are shown (within rows, same letters are not significant different at 5%). Refer to Table 1 and Part 2 of Appendix 2 for full sample details. Sensory term Bland Cooked apple Flavoursome Fresh apple Fruity Green apple Red apple Sour/acidic Strong Sweet
C-AJ 44b 25b 38b 32b 50b,c 27b 31b 11b 11b 52c
S+ 14a 29b 60c 41b 63c 22b 46c 9b 16b 75d
S++ 9a 41c 61c 35b 60c 9a 45c 0a 29c 88e
D 88c 23b 7a 17a 14a 22b 16a 12b 0a 12a
A+ 8a 13a 53c 35b 40b 67c 13a 63c 45d 31b
28
Part 6. Frequency of use of semantic categories for Part 2 of Study 1 (apple juice with smaller differences) for four pairs of samples. The control sample is C-AJ. Refer to Table 1 or Part 2 of Appendix 2 for full sample details.
Semantic Category (SC) Difference in sweetness perceived Difference in sourness perceived Difference in intensity (sweet/acid) perceived Differences perceived, not specific to sweet/acid Difference in sweetness not perceived Difference in sourness not perceived Difference in intensity sweet/acid not perceived Order effect Both sour Both sweet Both intense (strong, sweet/acid) Neither sweet Neither sour Neither intense (including bland, weak) Comparison with commercial / normal juice samples Comparison with fresh apples samples, natural or fruity Differentiated by flavoursome
C-AJ vs. S+ 17 15 14 21 4 4 8 4 4 11 3 0 34 5 4 16 1
C-AJ vs. S++ 14 17 17 20 2 1 3 0 1 11 2 0 35 1 4 12 3
C-AJ vs. D 33 6 21 38 4 0 5 0 0 8 0 10 1 13 5 18 3
C-AJ vs. A+ 33 14 17 34 3 0 2 1 3 13 4 7 0 1 2 17 6
29
Part 7. Frequency of use of semantic categories for Part 2 of Study 2 (apple juice with smaller differences) for four pairs of samples. The control sample is C-AJ. Refer to Table 1 or Part 2 of Appendix 2 for full sample details.
Semantic Category (SC) Difference in sweetness perceived Difference in sourness perceived Difference in intensity (sweet/acid) perceived Differences perceived, not specific to sweet/acid Difference in sweetness not perceived Difference in sourness not perceived Difference in intensity sweet/acid not perceived Order effect Both sour Both sweet Both intense (strong, sweet/acid) Neither sweet Neither sour Neither intense (including bland, weak) Comparison with commercial / normal juice samples Comparison with fresh apple samples, natural or fruity Differentiated by flavoursome
C-AJ vs. S+ 46 14 43 41 14 4 1 6 1 39 2 4 4 2 12 24 5
C-AJ vs. S++ 47 18 37 44 22 6 3 3 3 46 3 3 3 1 8 21 3
C-AJ vs. D 11 22 18 42 3 2 0 2 3 5 0 0 27 9 8 13 1
C-AJ vs. A+ 16 44 32 43 2 1 0 0 12 4 0 1 9 0 5 21 1
30
Part 8. Frequency table for Part 1 (CATA question) of Study 3 (apple juice with larger differences), showing frequency of use (%) of the 10 sensory terms for each sample (C-AJ, SS+, SS++, D, AA+). The control sample is C-AJ. The results of sign tests are shown (within rows, same letters are not significantly different at 5%). Refer to Table 1 and Appendix 2 for full sample details. Sensory term
C-AJ
SS+
SS++
D
AA+
Bland
33c
1b
6a
89d
7a,b
Cooked apple
19b
28c,d
35d
24b,c
7a
Flavoursome
35b
61c
58c
4a
40b
Fresh apple
54c
39b
34b
23a
32b
Fruity
53c
68d
63d
15a
30b
Green apple
32c
19b
10a
22b
68d
Red apple
32c
50d
48d
19b
7a
Sour/acidic
15b
2a
1a
11b
88c
Strong
11b
19c
35d
0a
69e
Sweet
46b
87c
93c
12a
14a
31
Part 9. Frequency of use of semantic categories for Part 2 of Study 3 (apple juice with larger differences) for four pairs of samples. The control sample is C-AJ. Refer to Table 1 or Appendix 2 for full sample details.
Semantic Category (SC) Neither sweet Neither sour Neither intense (incl. bland, weak) Both sour Both sweet Both intense (flavoursome, strong, sweet/acid) Difference in sweetness perceived Difference in sourness perceived Difference in intensity (sweet/acid) perceived Flavour perceived, not specific to sweet/acid Difference in sweetness not perceived Difference in sourness not perceived Difference in intensity sweet/acid not perceived Order effect Comparison to commercial juice samples Comparison to fresh apple, natural or fruity Difference in flavour or flavoursome (incl. bland, watery)
C-AJ vs. SS+ 6 2 4 2 33 6 65 10 64 62 31 87 30 3 2 30 49
C-AJ vs. SS++ 1 2 1 0 44 5 75 14 73 66 21 81 21 1 6 33 47
C-AJ vs. D 5 63 21 5 11 4 22 18 30 57 71 78 60 0 2 21 53
C-AJ vs. AA+ 1 8 1 12 4 3 36 76 74 60 52 13 9 0 4 31 38
32
Part 10. Frequency table for Part 1 (CATA question) of Study 4 (ice tea), showing frequency of use (%) of the 10 sensory terms for each sample (C-IT, CN+, CN++, SM+, SM++). The control sample is C-IT. The results of sign tests are shown (within rows, same letters are not significant different at 5%). Refer to Table 1 and Appendix 2 for full sample details. Sensory Term Chai/spices Cinnamon Fruity Mouth drying Smoky Strong Sweet Tea flavour Weak Woody
C-IT 16a 8b 10 22 17a 17a 15b 85b 57b 40a,b
CN+ 42b 32c 15 19 17a 22a 14b 83b 49b 34a
CN++ 74c 81d 12 19 16a 42b 17b 51a 25a 30a
SM+ 17a 6a,b 10 24 31b 24a 9a,b 82b 50b 49b
SM++ 13a 3a 6 24 79c 46b 6a 50a 27a 72c
33
715 716 717
Part 11. Frequency of use of semantic categories for Part 2 of Study 4 (ice tea) for two sets of two pairs of samples. The control sample is C-IT. Refer to Table 1 or Appendix 2 for full sample details.
718
Semantic Categories (SC) None of the samples has cinnamon flavour Both samples have cinnamon flavour Cinnamon flavour only in control sample Cinnamon flavour only in manipulated sample Differences in cinnamon intensity Cinnamon-like flavour only in manipulated sample Both samples have cinnamon-like flavour Differences between samples in cinnamon-like flavour Differences in intensity No flavour differences Flavour differences Control sample is bland/weak
C-IT vs. CN+ 26 1 3 49 3 7 1 14 10 1 1 14
C-IT vs. CN++ 29 4 1 44 3 13 10 17 15 4 4 11
719 720
Semantic Categories (SC) None of the samples has smoky flavour Both samples have smoky flavour Smoky flavour only in control sample Smoky flavour only in manipulated sample Differences in smoky flavour intensity Both samples have smoky-like flavour None of the samples have smoky-like flavour Smoky-like flavour in control sample only Smoky-like flavour only in manipulated sample Flavour differences Intensity differences Control sample in bland/weak No flavour differences
C-IT vs. SM+ 15 3 6 50 6 4 0 10 4 15 6 8 3
C-IT vs. SM++ 21 8 6 43 4 7 0 11 17 22 4 6 6
721 722 723 724
Research highlights
725 726
727 728
729 730
CATA questions are popular in product-focused consumer research on foods and beverages This research investigated CATA questions for sensory product characterisation by consumers Open-ended questions provided insights to consumers decisions to select, or not, a CATA term
34
731 732
733 734
Evidence was obtained of consumers’ ability to accurately perform sensory characterisation tasks Increases in stimulus intensity were appropriately captured through higher frequency of term use
735 736 737 738 739
Figure 1. 1A
Sample pair: C-AJ vs. S+ SC5. Differ ence in sweetness NOT perceived
SC4. Differences perceived, not specific to sweet SC3. Differ ence in intensity perceived
SC2. Difference in sweetness perceived SC1. Both sweet 0 20 40 Term 'sweet' selected for manipulated sample only
740 741 742 743 744
60
80
100
Term 'sweet' selected for both control and manipulated samples
1B
35
Sample pair: C-AJ vs. S++ SC5. Difference in sweetness NOT perceived
SC4. Differences perceived, not specific to sweet SC3. Difference in intensity perceived
SC2. Difference in sweetness perceived
SC1. Both sweet 0
20
40
60
80
100
Term 'sweet' selected for manipulated sample only
745 746
Term 'sweet' selected for both control and manipulated samples
36
748
1C
Term 'sour/acidic' selected for none of the samples SC6. Differ ence in intensity NOT perceived SC5. Difference in sourness perceived SC4. Difference in intensity perceived SC3. Differ ence in sweetness perceived SC2. Differ ences perceived, not specific to sourness SC1. Neither sour 0
749 750 751 752 753
Sample pair: C-AJ vs. S++
10
20
30
40
50
60
70
Sample pair: C-AJ vs. S+
Note: Categories were developed based on the open-ended responses and not all were mutually exclusive as would be the case with many closed question formats. Percentages were calculated separately for each response option of the CATA question.
754 755
37
756
Figure 2.
757 758
2A
Sample pair: C-AJ vs. D Term 'sour/acidic' selected for none of the samples SC6. Neither intense SC5. Difference in sweetness perceived SC4. Differ ence in sourness perceived SC3. Difference in intensity perceived SC2. Neither sour SC1. Differences perceived, not specific to sour ness 0
20
40
60
80
759 760 761 762 763
2B
Sample pair: C-AJ vs. D SC6. Differences in sweetness not perceived SC5. Difference in intensity perceived SC4. Differ ence in sweetness perceived SC3. Neither sweet SC2. Neither intense SC1. Differences perceived, not specific to sweetness 0
764
Term 'sweet' selected for control sampl e only
20
40
60
80
100
Term 'sweet' selected for neither sample
765
38
2C
Sample pair: C-AJ vs. A+ SC5. Both sour SC4. Differ ence in intensity perceived
SC3. Difference in sourness perceived SC2. Neither sour SC1. Differences perceived, not specific to sourness 0 Term 'sour/acidic' selected for both samples Term 'sour/acidic' selected for neither sample
20
40
60
80
100
Term 'sour/acidic' selected for manipulated sample only
Note: Categories were developed based on the open-ended responses and not all were mutually exclusive as would be the case with many closed question formats. Percentages were calculated separately for each response option of the CATA question.
39
Figure 3.
3A
Sample pair: C-AJ vs. SS+ SC7: Di fferences in flavour/flavoursome SC6: Neither sweet SC5: Di fference in intensity NOT perceived SC4: Di fference in sweetness NOT perceived SC3: Di fference in intensity perceived SC2: Di fferences in sweetness perceived SC1: Both sweet 0 20 40 Term 'sweet' selected for manipulated sample only
60
80
100
Term 'sweet' selected for both control and manipulated samples
3B
Sample pair: C-AJ vs. SS++ SC7: Di fferences in flavour/flavour some SC6: Neither sweet SC5: Di fference in intensity NOT perceived SC4: Di fference in sweetness NOT perceived SC3: Di fference in intensity perceiv ed SC2: Di fferences in sweetness perceiv ed SC1: Both sweet 0 20 40 Term 'sweet' selected for manipulated sample only
60
80
100
Term 'sweet' selected for both control and manipulated samples
40
3C SC9: Di fference in sourness NOT perceived SC8: Di fferences in sourness perceived SC7: Both sour SC6: Neither int ense SC5: Neither sour SC4: Di fferences in flavour/flavoursome SC3: Di fferences perceived, not specific to sour/acidic SC2: Di fference in intensity NOT perceived SC1: Di fference in intensity perceiv ed 0
20
40
60
80
100
Sample pair: C-AJ vs. D (t erm 'sour/acidic' selected for neither sampl e) Sample pair: C-AJ vs. AA+ (Term 'sour/acidic' sel ected for manipulated sample only)
Note: Categories were developed based on the open-ended responses and not all were mutually exclusive as would be the case with many closed question formats. Percentages were calculated separately for each response option of the CATA question.
41
Figure 4. 4A
Sample pair: C-IT vs. CN+ SC5: None of the samples had cinnamon flavour SC4: Control less intense
SC3: Di fferences in cinnamon intensity SC2: Di fferences between samples in cinnamon
SC1: Cinnamon flavour only in manipulated sample 0 Term 'cinammon' not selected for both samples
20
40
60
80
100
Term 'cinnamon' selected for manipulated sample only
4B
Sample pair: C-IT vs. CN++ Term 'cinnamon' selected for manipulated sample only SC5: Control sample is bland/weak SC4: Di fferences between samples in cinnamon SC3: Di fferences in cinnamon intensity SC2: Di fferences between samples in cinnamon SC1: Cinnamon flavour only in manipulated sample 0
20
40
60
80
42
766
4C
Sample pair: C-IT vs. SM+ SC3: Control sample is weak/bland
SC2: None of the samples had smoky flavour
SC1: Smoky flavor only in manipulated sample
0
767 768 769 770 771
Term 'smoky' not selected for both samples
20
40
60
80
100
Term 'smoky' selected for manipulated sample only
4D
Sample pair: C-IT vs. SM++ Term 'smoky' selected for manipulated sample only SC4: Smoky-like flavour in manipulated sampl e only
SC3: Flav our differences
SC2: None of the samples had smoky flavour
SC1: Smoky flavor only in manipulated sample 0
772 773 774 775
20
40
60
80
Note: Categories were developed based on the open-ended responses and not all were mutually exclusive as would be the case with many closed question formats. Percentages were calculated separately for each response option of the CATA question.
776 777
43