International Journal of Forecasting 33 (2017) 652–661
Contents lists available at ScienceDirect
International Journal of Forecasting journal homepage: www.elsevier.com/locate/ijforecast
The influence of product involvement and emotion on short-term product demand forecasting Valeria Belvedere a, *, Paul Goodwin b a b
Catholic University of the Sacred Heart, Via Necchi 5, 20123 Milan, Italy School of Management, University of Bath, Bath BA2 7AY, United Kingdom
article
info
Keywords: Sales forecasting Judgmental forecasting New products Emotions Product involvement False consensus
a b s t r a c t Sales forecasters in industries like fast-fashion face challenges posed by short and highly volatile sales time series. Computers can produce statistical forecasts, but these are often adjusted judgmentally to take into account factors such as market intelligence. We explore the effects of two potential influences on these adjustments: the forecaster’s involvement with the product category and their emotional reactions to particular products. Two forecasting experiments were conducted using data from a major Italian leather fashion goods producer. The participants’ judgmental adjustments tended to lower the forecast accuracy, but especially when the participants had strong preferences for particular products. This appeared to result from a false consensus effect. The most accurate forecasts were made when the participants had no knowledge of the product and only received time series information, though a high level of involvement with the product category also led to a greater accuracy. © 2017 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
1. Introduction Accurate demand forecasting for products is particularly critical in a number of industries that are characterized by strong time-based competition (Blackburn, 1991; Blackburn, 2012; Wallace & Choi, 2011; Stalk & Hou, 1990). However, managers who operate in sectors such as consumer electronics or fashion face the challenge of predicting future sales under conditions of an extremely high product variety, a significant demand variability and short demand histories (De Toni & Meneghetti, 2000; Fang, Zhang, Robb, & Blackburn, 2013; Forza & Vinelli, 2000; Thomassey, 2014). Accuracy requires forecasters to be able to combine this data, and any available statistical forecasts, with their knowledge of the product and the market (Seifert, Siemsen, Hadida, & Eisingerich, 2015). This is particularly necessary in volatile markets, where the author. * Corresponding E-mail addresses:
[email protected] (V. Belvedere),
[email protected] (P. Goodwin).
predictability of future sales is low and analytical forecasting models based on the extrapolation of time series can be highly inaccurate, especially given the short demand histories (Thomassey, 2014). Even when forecasters have expertise relating to their market, their forecasts are still likely to be subject to a range of cognitive and motivational biases (Berg, 2016; Lawrence, Goodwin, O’Connor, & Onkal, 2006). For example, in new product forecasting it is known that being involved in the development of the product can lead to wishful thinking and the selective processing of information to confirm that the product will be a success. The resulting forecasts suffer from an optimism bias (Tyebjee, 1987). Demand forecasts in sectors like the fashion industry may suffer from similar biases, as the products are particularly likely to trigger emotional reactions. This paper presents the results of two experiments that were designed to determine whether the accuracy of shortterm demand forecasts is influenced by either the level of involvement of the forecasters with the product category, or their personal emotion-based preferences for products
http://dx.doi.org/10.1016/j.ijforecast.2017.02.004 0169-2070/© 2017 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
V. Belvedere, P. Goodwin / International Journal of Forecasting 33 (2017) 652–661
within that category. We begin by reviewing the literature in order to outline the characteristics that are peculiar to the forecasting process in the fashion industry, and to identify the relevant psychological factors which can influence the accuracy of judgmental adjustments to forecasts. We then describe the experiments, before presenting the results and conclusions. 2. Review of the relevant literature 2.1. Forecasting in the fashion industry The forecasting process in many fashion industries – and the textile-apparel sector in particular – is characterized by a number of features that can make it especially challenging. As was highlighted by Thomassey (2014), these features include strong seasonal patterns and very short product life-cycles, coupled with a huge variety of products and short planning horizons (generally a few weeks) for managing stock replenishment at retail outlets. In addition, there are several exogenous variables, some of which cannot be controlled directly by manufacturing companies, that can have strong influences on sales. These include macroeconomic conditions, marketing strategies, retailing strategies and fashion trends. These factors, coupled with the lack of long sales time series for most products, mean that the applicability of traditional forecasting techniques, such as exponential smoothing or regression models, is limited (Nenni, Giustiniano, & Pirolo, 2013; Sichel, 2008; Thomassey, 2014). This is in contrast to forecasting for ‘‘basic’’ fashion products such as men’s shirts, which are characterized by long life-cycles, a lack of seasonality and small fluctuations. Approaches such as neural networks, extreme learning machine algorithms and fuzzy inference systems have proved effective for the latter case (Au, Choi, & Yu, 2008; Choi, Hui, Liu, Ng, & Yu, 2014; Sun, Choi, Au, & Yu, 2008; Thomassey, 2014). For ‘‘non-basic’’ products, it is rather uncommon to adopt standard software systems for supporting the forecasting process because of their low accuracy. Instead, it is quite common for fashion companies to develop their own processes, centred largely on their practitioners’ experience, in the hope that these will lead to low forecast errors, with the associated benefits of reduced inventories and infrequent store-level stock-outs (Sichel, 2008). One approach involves providing practitioners with a statistical forecast of the demand for their product, where such forecasts are often based simply on extrapolations of past sales patterns. Thus, it is common practice, in both the fashion and other industries, for practitioners to apply judgmental adjustments to these forecasts, ostensibly in order to take into account market intelligence and other contextual information that has not been allowed for in the statistical algorithm (Davydenko & Fildes, 2014; Fildes, Goodwin, Lawrence, & Nikolopoulos, 2009). When this contextual information is reliable and relates to events that will have significant effects on sales, such adjustments can improve the forecast accuracy (Sanders & Ritzman, 2001). For example, in a laboratory study, Lim and O’Connor (1996) found that adjustments based on reliable contextual information led to more accurate forecasts, while Edmundson,
653
Lawrence and O’Connor (1988) highlighted the positive impact of specific product information on the accuracy of judgmental sales forecasts. In the case of fashion products with short time series, the experimental evidence suggests that the provision of contextual information can have a positive influence on the forecast accuracy when a non-linear relationship is observed between the predictor variables and sales (Seifert et al., 2015). It has also been demonstrated that, while the use of point-of-sale (POS) data from local stores is crucial to the prediction of future sales, the value of this type of data can be enhanced by coupling it with available qualitative information, such as customers’ opinions collected by shop assistants or through virtual communities and social networks (Belvedere & Stabilini, 2014; Sull, 2010; Sull & Turconi, 2008). These kinds of approaches can enhance a company’s ‘‘market sensitivity’’ – that is, the ability to interpret market trends effectively and efficiently – thus allowing the company to identify, produce and deliver those items that enjoy better market potentials (Christopher, 2000). However, although the use of practitioners’ experience and intuition can result in favourable outcomes in highly unpredictable environments, it can also involve some risk for companies, since judgmental interventions can be influenced strongly by cognitive biases. For example, various studies have found that the benefits obtained from judgmental adjustments of statistical forecasts are often reduced by managers falsely seeing systematic patterns in the noise in time series and making gratuitous adjustments to reliable forecasts (O’Connor, Remus, & Griggs, 1993). People also tend to use contextual information inconsistently or inefficiently. In particular, they may overweight recent or more salient information (Tversky & Kahneman, 1974). For example, when determining the market potential of a new fashion product, managers may base their estimate on an analogy; that is, they may take a similar product, launched in the past, as a reference and then adjust its sales on the basis of contextual information that is relevant to the new product (Abernathy, Dunlop, Hammond, & Weil, 1999). In doing this, they can be overinfluenced by the cases that are most easily retrieved from their memory (Goodwin, Meeran, & Dyussekeneva, 2014; Lovallo, Clarke, & Camerer, 2012; Tversky & Kahneman, 1974). Even expertise is no guarantee of accurate forecasts. Experts may have closed mindsets, so that they are impervious to information that disconfirms their beliefs, and may exhibit overconfidence (Tetlock, 2005). 2.2. Product involvement, emotion and moods in forecasting Surprisingly, few studies have addressed the effects of product involvement and emotion on judgmental adjustments to forecasts, and these factors may be particularly important when considering the forecasting of sales in the fashion industry. This omission is in spite of the fact that Nobel Laureate Daniel Kahneman pointed out that ‘‘emotion now looms much larger in our understanding of intuitive judgments and choices than it did in the past’’ (Kahneman, 2011 p. 12). For product involvement, we adopt Zaichkowsky’s (1985) definition of involvement as: ‘‘A person’s perceived relevance of the object based on
654
V. Belvedere, P. Goodwin / International Journal of Forecasting 33 (2017) 652–661
inherent needs, values, and interests’’. In forecasting it seems reasonable to expect that an involvement in a given product category will be associated with interest in that category, and hence will increase the amount of attention that people pay to the forecasting task. For example, Celsi & Olson (1988) demonstrated that ‘‘felt’’ involvement in a specific situation is a major driver of the attention process. This increased attention might lead in turn to a deeper understanding of the sales dynamics that are peculiar to that particular product category. While consumers may have different perspectives on involvement compared to the managers who are producing product forecasts, it seems reasonable to expect that managers’ involvement in different product categories will vary, with implications for the amount of attention that they pay to the forecasting task. By ‘emotional reaction’ we refer to the positive or negative feelings that a person may have for a product. However, one problem that is common in the psychology literature is the fact that terms like ‘emotion’, ‘mood’ and ‘affect’ have often been used interchangeably or without any clear definition of their meanings (Batson, Shaw, & Oleson, 1992). Recently, though, Ekkekakis (2012) provided a useful classification. A core affect is defined as a ‘‘neurophysiological state consciously accessible as a feeling’’. Pleasure and displeasure are examples of core affects. Ekkekakis adopts Russell and Barrett’s definition of an emotion as ‘‘a complex set of interrelated sub-events concerned with a specific object such as a person, object or thing. . . ’’ (Russell & Barrett, 1999). The components of emotions include core affects (e.g., an object may arouse feelings of pleasure), the direction of attention to the stimulus that is eliciting the emotion, and an assessment of what the stimulus means and what its implications might be. Thus, emotions are stimulated by specific phenomena and may be evoked instantaneously. In contrast, moods have causes that are less easy to pinpoint, tend to last longer than emotions and are not focused on particular objects. In many practical situations, forecasters may produce a whole sequence of forecasts for a range of products in a single session (Fildes et al., 2009). While the forecaster’s mood during the session may have a global influence on the quality of all of their forecasts (Estrada, Isen, & Young, 1997; Isen, 2001; Jain, Bearden, & Filipowicz, 2013; Kahneman, 2011; Lim, Whang, Park, & Lee, 1998), we are interested here in the effects of their emotional reactions to specific products. Hence, even though a particular mood may prevail throughout the forecasting session, differences in their emotional reactions may lead to variation in the quality of forecasts for different products. There is evidence to suggest that an emotional reaction which takes the form of liking or disliking an object may reduce the quality of judgments. Kahneman (2011, p. 12) refers to the affect heuristic, ‘‘where judgments and decisions are guided directly by feelings of liking or disliking, with little deliberation or reasoning’’. This heuristic simplifies the world by allowing one to focus on either the benefits or the drawbacks of a course of action, thereby avoiding the need to make difficult trade-offs between the two. For example, positive and negative feelings about different technologies can influence people’s assessments
of the risks and benefits associated with them (Slovic, Finucane, Peters, & Macgregor, 2002). When assessing emotional reactions to products, it is known that ‘liked’ products tend to be perceived as having few, if any, attributes that are disliked (Fischer, Luce, & Jia, 2000). This can lead to a false consensus bias, where people wrongly assume that others share their own preferences because it is difficult to imagine why others would dislike a product (Ross, Greene, & House, 1977). Clearly, such a bias would tend to lead to the overestimation of sales for products that a forecaster likes. In contrast, disliked alternatives tend to be perceived as having a mixture of negatively and positively regarded attributes (Gershoff, Mukherjee, & Mukhopadhyay, 2007a). Thus, even when you dislike a product, it is relatively easy to think of positive aspects that may cause others to like it or have a neutral view of it, so that any false consensus bias will be less powerful (Gershoff, Mukherjee, & Mukhopadhyay, 2007b). Hence, your forecasts of a product’s sales are likely to be more realistic if you dislike it. In summary, the literature suggests that an involvement with a given product category is likely to increase a person’s engagement in the task of forecasting sales for a product in that category. However, a positive emotional reaction to the product may lead to an optimism bias, because only the positive factors associated with the product are perceived, which may lead to a false consensus bias. On this basis, we arrive at the following hypotheses. H1: A greater involvement in a product category is associated with more accurate judgmentally adjusted sales forecasts for products in that category. H2: A forecaster’s judgmentally adjusted forecasts will be less accurate when the product being forecasted is liked than when it is disliked. 2.3. Sequence of information provision One factor that may moderate the effects of emotions when managers are applying adjustments to statistical forecasts is the order in which they receive information about the product. It seems reasonable to expect that time series data and statistical forecasts will have less influence on the emotions than contextual information. One mechanism that people may use to combine the two forms of information is called anchoring-and-adjustment (Tversky & Kahneman, 1974). This involves making estimates by starting with an initial value (the ‘‘anchor’’), then ‘‘adjusting’’ it to obtain the final estimate. Typical anchors might be the most recent sales figure (Bolger & Harvey, 1993), the statistical forecast (Goodwin, 2005) or, in new product forecasting – as in the fashion industry – the sales of a similar product launched previously (Abernathy et al., 1999). A key aspect of anchoring and adjustment is that the anchor tends to carry undue weight in the assessment process, so that the adjustments from it are often too small. One implication of this is that providing people with a quantitative value in the initial phases of the forecasting process can have a strong influence on their subsequent judgment. Thus, judgments are likely to be sensitive to the order in which information becomes available. This raises the possibility that asking managers to adjust a statistical forecast based only on time series information, before
V. Belvedere, P. Goodwin / International Journal of Forecasting 33 (2017) 652–661
revealing contextual information to them, might lead to anchoring on this initial forecast, and hence a reduction in the effect of the emotion-arousing contextual information. There is some evidence to support this possibility. For example, Bergenstrom and Sherr (2003) showed that the order in which information is provided can affect final judgments. They analysed a number of medical decisions and found the order of presentation of verbal and numerical information to influence the final estimates of probabilities in some cases. In addition, Epley and Gilovich (2004) showed that adjustments tend to be smaller when they are made from a self-generated anchor, as would be the case with managers’ initial forecasts. However, there are also factors that may limit the moderating effect of using ‘emotionally-neutral’ data to inform the initial forecast. Potential anchors tend to carry less weight when numeric information is competing with non-quantitative information, particularly when this is in an anecdotal or narrative form. A large body of literature has shown that people tend to underweight statistical base rates in favour of more ‘‘colourful’’ stories, explanations or rumours, even when this information is unreliable (Tversky & Kahneman, 1974). This tendency to discount statistical forecasts can be reinforced by the ‘‘narrative fallacy’’ (Taleb, 2007), where people convince themselves that they have explanations for past movements in sales graphs when such movements are actually merely manifestations of noise. Even if the availability of qualitative and contextual information can determine such phenomena as the ‘‘narrative fallacy’’, the effects of emotions and involvement can be reduced if the forecaster anchors on an initial statistical forecast. Thus, if there is a negative emotional reaction for example, the forecasts may be more realistic because a false consensus bias is likely to be less prevalent. On this basis, we arrive at the following hypothesis. H3: For products that invoke emotional reactions, the accuracy of judgmentally adjusted forecasts will be higher when an initial forecast is produced based only on time series information (i.e., without a knowledge of the nature of the product). To test these hypotheses, two experiments were carried out that attempted to replicate, as far as possible, a typical demand forecasting situation in which the forecaster is provided with a statistical forecast of the sales of a product, together with a graph of the product’s sales history to date. The forecasters can then use their judgment to adjust the statistical forecast if they think that this will improve its accuracy. We obtained the actual sales data for ten different handbag lines from a leading Italian producer of leather fashion goods, for the first 10 weeks after the launch of these products. For each handbag line, the participants were supplied with the first nine weeks of sales figures and asked to forecast sales for week 10. The ten handbag lines were selected by the company as being among the best selling items of the latest collections. They showed various different demand patterns, characterized by a decreasing trend in some cases, and an increasing one in others.
655
Fig. 1. Example chart with sales data, a trend line and a forecast for week 10.
3. Experiment 1 3.1. Implementation The first experiment involved 166 participants who were taking an undergraduate course in technology and operations management. Although the participants were not practising managers, several studies have shown that the judgments of business students in laboratory experiments, such as that described here, are good proxies for those of managers (Bolton, Ockenfels, & Thonemann, 2012; Liyanarachchi & Milne, 2005; Remus, 1986). The laboratory environment also enables the research questions to be examined in a controlled way, free from extraneous factors such as company politics or pressure from senior management to provide forecasts that they consider acceptable. Each participant was allocated randomly to one of two groups. Those in the ‘informed before’ group (n = 78) were shown the handbags before they made their forecasts. Those in the ‘informed after’ group (n = 88) were shown the handbags only after they had made their initial forecasts, but were then able to adjust these forecasts. For each product, all participants were shown both a table indicating the numbers of handbags sold for the first nine weeks after the launch and a graph displaying these sales and a trend line, fitted using least squares. This line was extrapolated to produce a statistical forecast of the sales in week 10 (see Fig. 1 for a sample series). For each product, the data were normalized by setting the sales in week 1 to 100. The normalized actual sales in week 10 were used to measure the accuracy of the forecasts, and these were not displayed to the participants at any stage. The sequence of the 10 handbags was the same for the two groups. The participants were told that the top 5% of performers would be rewarded with a 50 euro voucher for the university book shop. We carried out this experiment by producing a webbased questionnaire, of which the first section was aimed at obtaining general information about the participants (gender and age), including their degree of knowledge of and professional experience in forecasting, measured on a scale of 1–7 (with 1 being ‘‘no knowledge’’ or ‘‘no professional experience’’). After the handbag had been shown to the participants, they were asked to indicate on a scale of 1 to 7 the extent to which they liked or disliked it (1 = don’t like, 4 = neutral, 7 = like), and to identify the name of the brand.
656
V. Belvedere, P. Goodwin / International Journal of Forecasting 33 (2017) 652–661
The final section of the questionnaire was designed to measure the participant’s level of involvement with the product category. They were first asked how many handbags they had bought over the last 12 months and how many they would buy per year if they had no financial constraints. Involvement was also assessed using the scale proposed by Zaichkowsky (1985), which consists of twenty bipolar items, each measured on a 1-to-7 Likert scale (e.g., ‘‘the product category is: unexciting (1), exciting (7)’’). The sum of the responses, which can range from 20 to 140, is intended to capture the extent of a person’s involvement. This measurement instrument was considered to be appropriate, since it has been found to be related positively to customers’ interest in gathering information about the product category. In our study, this interest was expected to drive a greater engagement with the forecasting task.
Fig. 2. Box plots of the RAEs in Experiment 1. Note: One forecast by a participant in the ‘‘Informed after’’ group that was adjusted by 561% was excluded from this and other related analyses because it was assumed to result from an error.
3.2. Results Table 1 gives details of the participants in each group, together with their self-reported mean levels of knowledge and professional experience related to sales forecasting. The table also shows that there is little difference between the groups’ mean involvement scores. The product involvement score was not correlated significantly with either the number of products bought in the last year (r = 0.023) or the number of purchases the participants said they would make in the coming year if they had no financial constraints (r = 0.106). There was a significant, albeit moderate, correlation between gender and the product involvement score, with females tending to have higher scores (r = 0.497, p < 0.001). A number of measures exist for measuring the forecast accuracy. Recently, Davydenko and Fildes (2013) suggested that the relative absolute error (RAE) is appropriate for assessing the accuracy of judgmental adjustments to statistical forecasts. The RAE is given by: RAE = |Actual − Adjusted forecast|/|Actual
− Statistical forecast|. By the end of the experiment, all participants had seen the nine weeks of sales data, the statistical forecast and the handbags. To assess the factors that determined the accuracy of their forecasts when they were in possession of all of this information, separate models were obtained for the ‘‘informed before’’ and ‘‘informed after’’ groups, with the RAE as the dependent variable. In the case of the ‘‘informed after’’ group, this was the RAE based on their final forecasts. The models had the following four explanatory variables: (i) whether the product was identified correctly by the participant (where 1 = Yes and 0 = No); (ii) the extent to which they indicated that they liked the product (this was treated as a categorical variable in order to permit the detection of any non-linear association between accuracy and the extent to which the product was liked); (iii) their product category involvement score; and (iv) the product (this was a categorical variable). The last variable was included in the model so as to allow the effects of the other variables to be estimated once the variation in accuracy that was explained by the different products had been taken into account. Because the first two independent
variables involved repeated measures for each participant and the distribution of RAEs was highly skewed (see the box plots in Fig. 2), the model was fitted using the method of generalized estimation equations (GEEs), with a gamma distribution being assumed for the dependent variable, and a log link function. The RAE was zero in a very small number of cases because a participant made a perfectly accurate forecast. As this would not be compatible with a gamma distribution, 0.00001 was added to the RAE in these cases. Fitting the model to data for which 5% trimming had been applied to the RAEs led to essentially the same results. The pairwise correlations between the independent variables were low, between –0.25 and 0.21, suggesting that collinearity was not a problem (in these correlations, the ‘like’ score was treated as a continuous variable and the product variables were excluded). Table 2 shows the results of the analysis. Significant differences between RAEs were found for the ten products, but these results are not shown for the sake of brevity. The model assumed an independent working correlation matrix structure, but very similar results were obtained when other structures (exchangeable and unstructured) were assumed. Because a log link function was used, the coefficients show the estimated multiplicative effect on the RAE if all other variables are held constant. Thus, a coefficient above one indicates that an increase in the variable will be associated with an increase in the RAE, and vice versa. In the ‘‘informed before’’ group, a one-point increase in the category involvement score reduced the RAE to 99.4% of its original percentage value on average (i.e., e–0.006 ), an effect which is statistically significant. Similarly, in the same group, a like score of 2 for a product was associated with an RAE that was on average 27.6% lower (i.e., 1 –e– 0.276 ) than that associated with a like score of 7. The ability to identify the brand of the product was not associated significantly with the accuracy. Like scores of between 2 and 5 were associated with significantly smaller average RAEs; however, such was not the case when a product was strongly disliked (i.e., when the score was 1), even though this evidence is not statistically significant. Hence, there was partial support for H1 and strong support for H2.
V. Belvedere, P. Goodwin / International Journal of Forecasting 33 (2017) 652–661
657
Table 1 Background information on the participants in the two groups in Experiment 1.
Mean age Mean knowledge of sales forecasting: 1 = no knowledge, 7 = expert knowledge Mean experience of sales forecasting 1 = no experience, 7 = expert experience Mean involvement score Females (%) n
Informed before
Informed after
21.1 2.83 1.69 85.8 52.6 78
21.4 2.50 1.39 86.7 48.9 88
Table 2 Determinants of forecast inaccuracy in Experiment 1 for the ‘‘Informed before’’ and ‘‘Informed after’’ groups, as measured by the RAE. Variable
Informed before Coefficient*
Intercept Correct identification Like product score a 1 2 3 4 5 6 Involvement score
Informed after p-value
Coefficient*
p-value
3.987 0.005
0.000 0.941
3.449 −0.041
0.000 0.594
−0.090 −0.276 −0.208 −0.238 −0.219 −0.122 −0.006
0.462 0.043 0.056 0.055 0.044 0.281 0.015
−0.059
0.698 0.195 0.747 0.532 0.708 0.858 0.161
0.186 −0.041 −0.073 −0.055 −0.026 −0.004
n = 780 forecasts for the ‘‘Informed before’’ group and 779 for the ‘‘Informed after’’ group. a For the ‘Like product score’, the results are relative to a score of 7.
Fig. 3. Mean percentage adjustments for the ‘‘Informed before’’ group in Experiment 1.
In general, the participants’ adjustments damaged the forecast accuracy. The median RAE was 1.24, showing that the error of their adjusted forecasts was typically 24% higher than that of the statistical forecasts provided. Fig. 3 shows the relationship between the mean percentage adjustments to the statistical forecasts and the ‘like score’ for the ‘‘informed before’’ group. There was a general propensity to reduce the statistical forecasts. Eight of the ten products had an upward trend in sales over the ten weeks following their launch, and a tendency for judgmental forecasters to predict damped upward trends has been reported widely in the literature. This may be due to anchoring on, and under-adjusting from, (a) the last observation (Bolger & Harvey, 1993), (b) the implied constraint suggested by the upper boundary of the sales graph, or (c) a belief, based on experience, that sales cannot continue growing forever at a constant rate, and that early growth must gradually slow down (Lawrence & Makridakis, 1989). In general, the more a product was disliked, the greater the
downward adjustment. In this case, the statistical forecasts were typically too high, needing a downward adjustment of 24.5% on average (though the adjustments required ranged from –200.8% to +34.7%). However, the general worsening of the RAEs as a result of adjustment reflected a failure to match the adjustments that were made to those that were required for each series. Recall that H3 assumed that any biases resulting from forecasters liking a product would be reduced if they first produced forecasts based only on time series information. The idea was that they would anchor on their initial forecasts, thus mitigating any biases resulting from their subsequent positive emotional response to the product. Table 2 indicates that the extent to which a product was liked had no association with the accuracy of the final forecasts for participants in the ‘‘informed after’’ group. Only 27.5% of this group’s forecasts were adjusted further when the handbags were revealed. On average, participants who strongly liked or disliked a product made larger adjustments than those with other preference levels (see Fig. 4), but these adjustments were not associated with higher or lower accuracy than those made for other preference levels. In most cases, the participants preferred to stay with the forecasts that they had made based only on time series information, and it appears that they were right to do so. On average, the forecasts made before the bags were revealed were significantly more accurate than those made after the forecasters were informed about the products. A Wilcoxon signed ranks test comparing each participant’s median RAE indicated that this was higher for the later forecasts (p = 0.021, two tail), although the median increase in the RAE was only 0.03, due to the relative infrequency of adjustments. Moreover, the initial forecasts of the ‘‘informed after’’ group, which were based only on time series information, were significantly more
658
V. Belvedere, P. Goodwin / International Journal of Forecasting 33 (2017) 652–661
mood (based on three 1-to-5 scales: happy to sad, pleased to annoyed and satisfied to dissatisfied, with the mood score being obtained by summing the three ratings), following the procedure used by Raghunathan and Irwin (2001). They were also asked to assess their satisfaction with each bag on a scale of 1–7 for each of ten attributes, including the bag’s size, trendiness, manufacturing quality and versatility. The selection of these attributes was based on studies by Griffin and Hauser (1993) and Matzler and Hinterhuber (1998). 4.2. Results Fig. 4. Further mean percentage adjustment for the ‘‘Informed after’’ group when bags revealed.
accurate than those of the ‘‘informed before’’ group, who knew the details of the products when making their forecasts. A Mann–Whitney test comparing the median RAEs of the two groups had a p-value of 0.034, two tail (the median difference was 0.112). All of this indicates that providing information about the product and asking for an assessment of how much the product was liked served only to damage the forecast accuracy. For the ‘‘informed after’’ group, the damage was sufficient to nullify the relative accuracy of their initial, time series based forecasts. A Mann– Whitney test comparing the median RAEs of the ‘‘informed before’’ group with the median RAEs of the final forecasts of the ‘‘informed after’’ group had a p-value of 0.10, two-tail. This provides only weak support for H3. Thus, although the strategy of providing product information after an initial forecast had been made discouraged adjustments, the adjustments that were made were still damaging enough to largely cancel out the strategy’s effectiveness. 4. Experiment 2 4.1. Implementation Experiment 1 raised a number of issues. First, there was a danger that the results could be distorted by a confounding of the extent to which a product was liked and the demand patterns of particular products. For example, if products with harder-to-forecast series tended to be liked more than those with easier-to-forecast series, the association between ‘liking’ and accuracy could be spurious. Second, the results did not control for the mood of the participant at the time when the forecasts were made. Mood could add noise to the measurements, thus masking the effects of medium levels of emotion on the forecast accuracy. Third, the participants were not asked why they liked or disliked the bags. Getting them to reflect on the features that they liked and disliked might lead to more deliberative reasoning, enabling them to imagine why a product that they liked or disliked could be seen differently by other potential consumers, thus counteracting the falseconsensus bias. To take into account these issues, the task in Experiment 2 was almost identical to that carried out by the ‘‘informed before’’ group in Experiment 1. However, this experiment asked the participants to rate their current
Forty-eight M.Sc. students who were taking a supply chain management course participated in the experiment. Table 3 reports the details of the sample, together with their self-reported mean levels of knowledge and professional experience in relation to sales forecasting. There was a significant difference between genders as far as the product involvement score is concerned, with females tending to have higher mean scores (equal to 109.2 for females and 85.5 for males, p < 0.001). The preferences of the participants for the different bags differed considerably from those who took part in Experiment 1. For example, nearly 23% of people in the first experiment gave bag 9 the maximum ‘like score’ of 7, whereas no one in the second experiment gave it this score. Similarly, nearly 29% of people gave bag 2 a score of 1 in the first experiment, but only 6.25% gave it this score in the second experiment. Nearly 46% of people in Experiment 2 very strongly disliked bag 10, whereas only 13% gave it this score in Experiment 1. This allays the concern that demand time series patterns might be confounded with product preferences. In general, the participants’ adjustments reduced the accuracy of the statistical forecasts: the median RAE was 1.19. As in Experiment 1, the method of generalized estimation equations, with a gamma distribution assumed for the dependent variable (RAE), and a log link function were applied to the data. The explanatory variables were the same as those in the models for Experiment 1, with the addition of the aggregate mood score. The correlations between these variables had a maximum absolute value of only 0.23, suggesting that their effects could be assessed independently. As before, fitting the model to data with a 5% trim applied to the RAEs did not change the key findings. Table 4 shows the results. It can be seen that the coefficient for the mood score was not significant. Unlike in Experiment 1, the ability to identify the brand correctly was associated significantly with the accuracy: those who could identify the brand correctly tended to produce less accurate forecasts. However, as was the case for the ‘‘informed before’’ group in Experiment 1, those who had either a strong or extreme liking for a product (i.e., a like score of 6 or 7) tended to produce significantly less accurate forecasts than those expressing other preference levels. The sizes of the coefficients for the different levels of liking suggest that the most accurate forecasts were produced typically by those who disliked a product. In addition, unlike Experiment 1, an extreme dislike of the product (a like score of 1) was associated with significantly more accurate forecasts than an extreme liking for a product.
V. Belvedere, P. Goodwin / International Journal of Forecasting 33 (2017) 652–661
659
Table 3 Background information on the participants in Experiment 2. Mean age
24.0
Mean knowledge of sales forecasting: 1 = no knowledge, 7 = expert knowledge Mean experience of sales forecasting 1 = no experience, 7 = expert experience Mean involvement score Females (%) n
3.23 1.85 94.9 39.6 48
Table 4 Determinants of forecast inaccuracy in Experiment 2, as measured by the RAE. Variable
Coefficient*
p-value
Intercept Correct identification Like product score a 1 2 3 4 5 6 Involvement score Mood score
4.736 −0.254
0.000 0.000
−0.768 −0.809 −0.776 −0.395 −0.355 −0.279 −0.009 −0.020
0.462 0.043 0.056 0.055 0.044 0.281 0.000 0.260
n = 480 forecasts. a For the ‘Like product score’, the results are relative to a score of 7.
Fig. 5. Mean percentage adjustments in Experiment 2.
The absolute values of the coefficients for the like scores are much larger in Experiment 2, suggesting that the differences in accuracy associated with liking or disliking a product are greater. Possibly the requirement to rate one’s satisfaction with different attributes of the products intensified the influence of product preferences on the forecasts. This is reflected in the mean percentage adjustments shown in Fig. 5, where the underlying slope of the graph is greater than that seen in Fig. 3. In line with Experiment 1, higher product involvement scores were associated with more accurate forecasts. Overall, the results provide strong support for H1 and H2. 5. Discussion and conclusions The results of our studies suggest that a strong positive emotional reaction to products is likely to lead people to make damaging judgmental adjustments to statistical forecasts. While the participants’ adjustments in general led to a deterioration in forecast accuracy, the forecasts made by participants who only had access to time series
information and were unaware of the product were significantly more accurate. This is in contrast to earlier findings in the literature that the provision of product-specific information enhances the accuracy of judgmental forecasts (Edmundson et al., 1988). It suggests that the possession of product-specific information is counterproductive when the products generate strong emotional reactions, as is the case with fashion goods. In such cases, it seems likely that the increased forecast errors are due to a false-consensus bias, because people think that others will share their own strong preferences, leading to high sales. This bias can be exacerbated by asking people to focus on their own position (Marks & Miller, 1987), and the larger effect sizes found in Experiment 2 may have resulted from asking them to deliberate on their levels of satisfaction with specific attributes of the products. Rather than encouraging them to adopt a broader perspective by considering both positive and negative aspects of the handbags, this process appears to have intensified their belief that their own preferences will be shared widely (Kahneman, 2011). The general tendency of those who disliked a product to forecast more accurately than those who liked it is consistent with findings in the psychology literature that the false-consensus bias is weaker for products that are disliked. As was hypothesised, a high involvement with the product category was associated with a greater accuracy in both experiments. It is likely that an interest in a product category will lead to an increased attention and commitment to the task, but it is not clear exactly how this translated directly into an improved accuracy. It may be that the participants were motivated to put more effort into considering the available information; indeed, those with a high involvement with a product category may have been genuinely interested in patterns of past sales. The willingness to put more effort into a task is likely to be associated with analytic (system 2) rather than intuitive (system 1) thinking, thereby reducing the influence of biases associated with intuition (Kahneman, 2011; Moritz, Siemsen & Kremer, 2014). This may have accounted for the greater accuracy, but further research, tracing the process of arriving at forecasts, would be needed to confirm this. The consistency of the main results across the two experiments suggests that our findings are robust. However, there were some differences. The ability to identify the product brand was associated significantly with lower levels of accuracy in Experiment 2, but not in Experiment 1. Similarly, an extreme dislike of the product was associated with a greater accuracy than a strong or extreme liking in Experiment 2, but not for the ‘‘informed before’’ group in Experiment 1. It is not clear why these differences occurred, but it is notable that the median product involvement score of participants in Experiment 2 was
660
V. Belvedere, P. Goodwin / International Journal of Forecasting 33 (2017) 652–661
significantly higher than that for Experiment 1 (Mann– Whitney test, p = 0.034, two tail). The strategy of delaying the provision of information about the product in the hope that an anchoring bias would counteract the subsequent enthusiastic upward forecast adjustments by those who liked the product had only moderate success. It did restrict the frequency with which further damaging adjustments were made to the forecasts once the new information was presented, but not sufficiently to improve the accuracy significantly relative to the ‘‘informed before’’ group in Experiment 1. This shows that the anchoring and adjustment heuristic is by no means ubiquitous, and that ‘‘dull’’ numbers (in this case, the information in the chart) cannot be guaranteed to act as an anchor when colourful and attention-grabbing qualitative information invokes strong feelings about a product. Future research could establish whether other mitigation strategies are more successful. For example, FentonO’Creevy et al. (2011) examined the effects of the emotion regulation strategies employed by financial traders. They found that traders who employed antecedent-focused emotional regulation strategies –which seek to change emotions before emotional responses are triggered – achieved higher levels of performance than those who employed strategies which aimed to modify behaviour and the expression of emotion after the emotional response had been stimulated. There are a number of limitations to our study. Further research is needed to establish the extent to which managers’ judgments replicate those of the participants in this experiment. Managers might be more experienced in the forecasting task, and hence may have learned to perform the task better and with less emotional involvement, though there is currently no evidence from the field that experience improves accuracy. Also, we did not consider personality factors, such as conscientiousness, which could mediate some of the associations that we have reported. On the plus side, our use of real short time series enhanced the ecological validity of the experiment. As we focused on products with short life-cycles, our use of trend-based forecasting methods was also probably more appropriate than the exponential smoothing based methods which are used widely by supply chain companies for producing forecasts for long-established product lines. Furthermore, within the category of products with short life-cycles, we focused on one specific product type (i.e., handbags). It would be useful to replicate the study with other items (e.g., from the consumer electronics industry) that pose similar challenges in the forecasting process. If replicated in the field, our results have important implications for those with the responsibility of forecasting sales of products that invoke emotional responses in potential buyers. The results suggest that good forecasters will have a sense of involvement in the relevant product category, but that they will need to avoid the optimism bias that can arise when they themselves have a strong emotional liking for the product, as they are likely to overestimate the extent to which their preferences are shared by the market.
References Abernathy, F., Dunlop, J., Hammond, J., & Weil, D. (1999). A stitch in time. Lean retailing and the transformation of manufacturing –lessons from the apparel and textile industries. New York: Oxford University Press. Au, K. F., Choi, T. M., & Yu, Y. (2008). Fashion retail forecasting by evolutionary neural networks. International Journal of Production Economics, 114(2), 615–630. Batson, C., Shaw, L. L., & Oleson, K. (1992). Differentiating affect, mood, and emotion: toward functionally based conceptual distinctions. In Clark, M. (Ed.), Emotion review of personality and social psychology, No 13 (pp. 294–326). Thousand Oaks, CA.: Sage Publications. Belvedere, V., & Stabilini, G. (2014). Alternative models of demand-driven supply chains: size-specific solutions in the Italian fashion companies. Fashion Practice, 2, 221–242. Berg, J. M. (2016). Balancing on the creative highwire forecasting the success of novel ideas in organizations. Administrative Science Quarterly, 61(3), 433–468. Bergenstrom, A., & Sherr, L. (2003). The effect of order of presentation of verbal probability expressions on numerical estimates in a medical context. Psychology, Health and Medicine, 8, 391–398. Blackburn, J. (1991). Time-based competition: The next battleground in American manufacturing. Homewood, IL: Business One Irwin. Blackburn, J. (2012). Valuing time in supply chains: establishing limits of time-based competition. Journal of Operations Management, 30, 396–405. Bolger, F., & Harvey, N. (1993). Context-sensitive heuristics in statistical reasoning. Quarterly Journal of Experimental Psychology, 46A, 779–811. Bolton, G. E., Ockenfels, A., & Thonemann, U. W. (2012). Managers and students as newsvendors. Management Science, 58(12), 2225–2233. Celsi, R. L., & Olson, J. C. (1988). The role of involvement in attention and comprehension processes. Journal of Consumer Research, 15, 210–224. Choi, T. M., Hui, C. L., Liu, N., Ng, S. F., & Yu, Y. (2014). Fast fashion sales forecasting with limited data and time. Decision Support Systems, 59, 84–92. Christopher, M. (2000). The agile supply chain: competing in volatile markets. Industrial Marketing Management, 29(1), 37–44. Davydenko, A., & Fildes, R. (2013). Measuring forecasting accuracy: the case of judgmental adjustments to SKU-level demand forecasts. International Journal of Forecasting, 29, 510–522. Davydenko, A., & Fildes, R. (2014). Measuring forecasting accuracy: problems and recommendations (by the example of SKU-level judgmental adjustments). In Choi, T. M., & Hui, C. L. (Eds.), Intelligent fashion forecasting systems: models and applications (pp. 43–70). Berlin, Heidelberg: Springer. De Toni, A., & Meneghetti, A. (2000). The production planning process for a network of firms in the textile-apparel industry. International Journal of Production Economics, 65, 17–32. Edmundson, R., Lawrence, M., & O’Connor, M. (1988). The use of nontime series information in sales forecasting: a case study. Journal of Forecasting, 7, 201–211. Ekkekakis, P. (2012). The measurement of affect, mood, and emotion in exercise psychology. In Tenenbaum, G., Eklund, R. C., & Kamata, A. (Eds.), Measurement in sport and exercise psychology (pp. 321–332). Champaign, IL: Human Kinetics. Epley, N., & Gilovich, T. (2004). Are adjustments insufficient? Personality and Social Psychology Bulletin, 30, 447–460. Estrada, C. A., Isen, A. M., & Young, M. J. (1997). Positive affect facilitates integration of information and decreases anchoring in reasoning among physicians. Organizational Behavior and Human Decision Processes, 72, 117–135. Fang, X., Zhang, C., Robb, D. J., & Blackburn, J. D. (2013). Decision support for lead time and demand variability reduction. Omega – International Journal of Management Science, 41, 390–396. Fenton-O’creevy, M., Soane, E., Nicholson, N., & Willman, P. (2011). Thinking, feeling and deciding: the influence of emotions on the decision making and performance of traders. Journal of Organizational Behavior, 32, 1044–1061. Fildes, R., Goodwin, P., Lawrence, M., & Nikolopoulos, K. (2009). Effective forecasting and judgmental adjustments: an empirical evaluation and strategies for improvement in supply-chain planning. International Journal of Forecasting, 25, 3–23.
V. Belvedere, P. Goodwin / International Journal of Forecasting 33 (2017) 652–661 Fischer, G. W., Luce, M. F., & Jia, J. (2000). Attribute conflict and preference uncertainty: effects on judgment time and error. Management Science, 46, 88–103. Forza, C., & Vinelli, A. (2000). Time compression in production and distribution within the textile-apparel chain. Integrated Manufacturing Systems, 11, 138–146. Gershoff, A., Mukherjee, A., & Mukhopadhyay, A. (2007a). Few ways to love, but many ways to hate: attribute ambiguity and the positivity effect in agent evaluation. Journal of Consumer Research, 33, 499–505. Gershoff, A., Mukherjee, A., & Mukhopadhyay, A. (2007b). What’s not to like? Preference asymmetry in the false consensus effect. Journal of Consumer Research, 35, 119–125. Goodwin, P. (2005). Providing support for decisions based on time series information under conditions of asymmetric loss. European Journal of Operational Research, 163, 388–402. Goodwin, P., Meeran, S., & Dyussekeneva, K. (2014). The challenges of prelaunch forecasting of adoption time series for new durable products. International Journal of Forecasting, 30, 1082–1097. Griffin, A., & Hauser, J. R. (1993). The voice of the customer. Marketing Science, 12(1), 1–27. Isen, A. M. (2001). An influence of positive affect on decision making in complex situations: theoretical issues with practical implications. Journal of Consumer Psychology, 11, 75–85. Jain, K., Bearden, J., & Filipowicz, A. (2013). Depression and forecast accuracy: evidence from the 2010 FIFA World Cup. International Journal of Forecasting, 29, 69–79. Kahneman, D. (2011). Thinking Fast and Slow. London: Penguin. Lawrence, M., Goodwin, P., O’Connor, M., & Onkal, D. (2006). Judgmental forecasting: a review of progress over the last 25 years. International Journal of Forecasting, 22, 493–518. Lawrence, M., & Makridakis, S. (1989). Factors affecting judgmental forecasts and confidence intervals. Organizational Behavior and Human Decision Processes, 43, 172–187. Lim, J., & O’Connor, M. (1996). Judgmental forecasting with time series and causal information. International Journal of Forecasting, 12, 139–153. Lim, J., Whang, M., Park, H., & Lee, H. (1998). A physiological apporach to the effect of emotion on time series judgmental forecasting EEG and GSR. Korean Journal of the Science of Emotion and Sensibility, 1, 123–133. Liyanarachchi, G., & Milne, M. (2005). Comparing the investment decisions of accounting practitioners and students: an empirical study on the adequacy of student surrogates. Accounting Forum, 29, 121–135. Lovallo, D., Clarke, C., & Camerer, C. (2012). Robust analogizing and the outside view: two empirical tests of case-based decision making. Strategic Management Journal, 33(5), 496–512. Marks, G., & Miller, N. (1987). 10 years of research on the false-consensus effect – an empirical and theoretical review. Psychological Bulletin, 102, 72–90. Matzler, K., & Hinterhuber, H. H. (1998). How to make product development projects more successful by integrating Kano’s model of customer satisfaction into quality function deployment. Technovation, 18(1), 25–38. Moritz, B., Siemsen, E., & Kremer, M. (2014). Judgmental forecasting: cognitive reflection and decision speed. Production and Operations Management, 23, 1146–1160. Nenni, M. E., Giustiniano, L., & Pirolo, L. (2013). Demand forecasting in the fashion industry: a review. International Journal of Engineering Business Management, 5, 1–6. O’Connor, M., Remus, W., & Griggs, K. (1993). Judgemental forecasting in times of change. International Journal of Forecasting, 9, 163–172.
661
Raghunathan, R., & Irwin, J. R. (2001). Walking the hedonic product treadmill: default contrast and mood-based assimilation in judgments of predicted happiness with a target product. Journal of Consumer Research, 28(3), 355–368. Remus, W. (1986). Graduate-students as surrogates for managers in experiments on business decision-making. Journal of Business Research, 14, 19–25. Ross, L., Greene, D., & House, P. (1977). False consensus effect – egocentric bias in social-perception and attribution processes. Journal of Experimental Social Psychology, 13, 279–301. Russell, J. A., & Barrett, L. F. (1999). Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant. Journal of Personality and Social Psychology, 76, 805–819. Sanders, N., & Ritzman, L. (2001). Judgmental adjustments of statistical forecasts. In Armstrong, J. S. (Ed.), Principles of forecasting: a handbook for researchers and practitioners (pp. 405–416). Norwell, MA: Kluwer Academic Publishers. Seifert, M., Siemsen, E., Hadida, A. L., & Eisingerich, A. B. (2015). Effective judgmental forecasting in the context of fashion products. Journal of Operations Management, 36, 33–45. Sichel, B. (2008). Forecasting demand with point of sales data – a case study of fashion products. The Journal of Business Forecasting, 27(4), 15–16. Slovic, P., Finucane, M., Peters, E., & MacGregor, D. (2002). The affect heuristic. In Gilovich, T., Griffin, D., & Kahneman, D. (Eds.), Heuristics and Biases (pp. 397–420). New York: Cambridge University Press. Stalk, G. Jr., & Hou, T. M. (1990). Competing against time: how time-based competition is reshaping global markets. New York, NY: The Free Press. Sull, D. (2010). Competing through organizational agility (cover story). McKinsey Quarterly, 1, 48–56. Sull, D., & Turconi, S. (2008). Fast fashion lessons. Business Strategy Review, 19(2), 5–11. Sun, Z. L., Choi, T. M., Au, K. F., & Yu, Y. (2008). Sales forecasting using extreme learning machine with applications in fashion retailing. Decision Support Systems, 46(1), 411–419. Taleb, N. (2007). The Black Swan. New York: Random House. Tetlock, P. (2005). Expert political judgment: how good is it, how can we know. Princeton, NJ: Princeton University Press. Thomassey, S. (2014). Sales forecasting in apparel and fashion industry: a review. In Choi, T. M., & Hui, C. L. (Eds.), Intelligent fashion forecasting systems: models and applications (pp. 9–27). Berlin, Heidelberg: Springer. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: heuristics and biases. Science, 185, 1124–1131. Tyebjee, T. T. (1987). Behavioral biases in new product forecasting. International Journal of Forecasting, 3, 393–404. Wallace, S. W., & Choi, T. M. (2011). Challenges in apparel production planning and control. Production Planning and Control, 22, 209. Zaichkowsky, J. L. (1985). Measuring the involvement construct. Journal of Consumer Research, 12, 341–352. Valeria Belvedere is Assistant Professor of Operations and Supply Chain Management at the Catholic University of the Sacred Heart, Milan. Her main fields of research and teaching are performance measurement and management, manufacturing strategy, and supply chain management in fashion companies. Paul Goodwin is Emeritus Professor of Management Science at the University of Bath. His research interests are concerned with the role of management judgment in forecasting and decision making. He is an Honorary Fellow of the International Institute of Forecasters.