Analysis and forecasting of demand during promotions for perishable items

Analysis and forecasting of demand during promotions for perishable items

Int. J. Production Economics 172 (2016) 65–75 Contents lists available at ScienceDirect Int. J. Production Economics journal homepage: www.elsevier...

631KB Sizes 0 Downloads 46 Views

Int. J. Production Economics 172 (2016) 65–75

Contents lists available at ScienceDirect

Int. J. Production Economics journal homepage: www.elsevier.com/locate/ijpe

Analysis and forecasting of demand during promotions for perishable items K.H. van Donselaar n, J. Peters, A. de Jong, R.A.C.M. Broekmeulen Eindhoven University of Technology, Department of Technology Management, PO Box 513, 5600 MB Eindhoven, The Netherlands

art ic l e i nf o

a b s t r a c t

Article history: Received 28 September 2012 Accepted 25 October 2015 Available online 9 November 2015

This study examines promotions for perishable products in a retail environment. We analyze the impact of relative price discounts on product sales during a promotion and shed light on how to build models to forecast promotional demand for perishable products. Preliminary analyses, based on regression models and a large dataset from a retailer, do not reveal conclusive evidence for the presence of threshold and/or saturation levels for price discounts for perishable products. A potential explanation comes from the observation that, although products like desserts on average allow 1,5 weeks time-to-consume, their sales during promotions on average are equal to 14 weeks of regular sales. This suggests that the success of a promotion is not so much determined by the restriction to stockpile (due to the short time-toconsume) but by the emergence of substitution effects (consumers switching between different products of the same category). We develop and test different models to forecast the demand during a promotion, including a moving average forecast and several regression models. Within the class of regression models we find that modeling threshold and saturation effects leads to worse forecasting performance than modeling price reductions linearly or quadratically. The largest improvements in forecast accuracy are gained by distinguishing between routine and non-routine product categories. Routine categories with routine demand processes and a large number of observations perform best when applying a regression based on direct observations of the product category, whilst non-routine categories benefit from a regression which also uses observations from other product categories. & 2015 Elsevier B.V. All rights reserved.

Keywords: Perishable Promotion Price discount Retail Demand forecasting Threshold Saturation

1. Research objectives and relevance There are two research objectives in this paper. The first objective is to gain a more in-depth understanding of the impact of temporary price discounts on the sales of perishable items. The second is to use this knowledge and additional ideas to develop a model to assist retailers in making a forecast for the demand during a promotion for a perishable item. The first objective directly responds to a key issue in sales promotion research addressed by Blattberg et al. (1995). In their paper, they call for more empirical results on the shape of the so called ‘Deal Effect Curve’. This curve shows how the promotional sales depend on the relative price discount. Since then, only a few empirical studies on the Deal Effect Curve have paid attention to the presence of threshold and saturation effects for price discounts. The threshold level is the minimum value of a temporary price discount needed to change the consumer's purchases (Gupta n

Corresponding author. Tel.: þ 31 40 2472869. E-mail addresses: [email protected] (K.H. van Donselaar), [email protected] (J. Peters), [email protected] (A. de Jong), [email protected] (R.A.C.M. Broekmeulen). http://dx.doi.org/10.1016/j.ijpe.2015.10.022 0925-5273/& 2015 Elsevier B.V. All rights reserved.

and Cooper, 1992). The saturation level is the level of a temporary price discount from which the consumer no longer increases its purchases if the discount further increases. Despite the fact that perishable products are core items for grocery retailers, little is known about the effect of price discounts on promotional sales for perishable products. According to Blattberg et al. (1995), the limit in stockpiling is one of the reasons for saturation effects. This makes perishable items, with their short time-to-consume from the moment they are sold, especially interesting to investigate. This is the first study which analyzes threshold and saturation effects for perishable items specifically, based on a large empirical dataset with a variety of perishable items. The second research objective responds to the increased awareness, worldwide, that food waste should be reduced. For example in early 2012 the European Parliament has called for urgent measures to halve food waste by 2025 and to improve access to food for needy EU citizens. Moreover, they called for 2014 to be designated as “European year against food waste”. (European Parliament, 2012). Retailers who can improve their demand forecast accuracy will be better able to match supply with demand. By doing so, they will not only reduce food waste but also benefit from increased product availability for their customers and an

66

K.H. van Donselaar et al. / Int. J. Production Economics 172 (2016) 65–75

improved customer's perception of their food quality. Heller (2002) indicated that the quality of perishable items is a core reason for customers to visit a specific store. Furthermore, according to Thron et al. (2007), the sale of perishable items is of vastly increasing importance to grocery retailers; around 50% of the total turnover can be accounted to perishable items. On the other side, Thron et al. (2007) state that, due to their limited life span, perishable items also account for the majority of product losses; around 15% is lost due to damage, spoilage, and expiry. Hence, how to better control inventories for perishable items constitutes a major challenge for retailers. van Donselaar et al. (2006) and Broekmeulen and van Donselaar (2009) provide several solutions to improve the control of inventories for perishable items if those items are not on promotion. Promotions however have become more and more important. Fierce competition among supermarket chains has led to a significant increase (already since the 1970s) in the number, frequency, and depth of promotions (Srinivasan et al., 2004). Promotions are also responsible for a large part of the waste and the out-of-stock situations for perishable items since, as pointed out by Gruen et al. (2002), Corsten and Gruen (2003) and Ettouzani et al. (2012) emphasize that the demand for promotional products is notoriously difficult to forecast and manage. The latter authors end their paper with a call for research on new forecasting models for demand during promotions. In conclusion, the sale of perishable items being of vastly increasing importance for grocery retailers, the growing awareness worldwide that food waste should be reduced substantially in the next decade, the importance of price promotions as a marketing tool for retailers and the fact that promotions for perishable items are responsible for a large part of the waste and the out-ofstock situations in a store makes it all more than worthwhile to focus on the analysis and forecasting of demand during promotions for perishable items. The paper is organized as follows. In section 2 we describe the research environment and the empirical data. In Section 3 we review the literature related to threshold and saturation levels as well as the literature dealing with forecasting models for promotional demand. The focal model will be described in Section 4. In Section 5 the results on the threshold and saturation levels for perishable items are presented. The different forecasting models for promotional demand and the associated forecasting accuracies are discussed in Section 6. In Section 7 we present the conclusions and discuss several options for future research.

For example, both the retailer's DC and the manufacturer carry safety stock to prevent out-of-stocks as much as possible. At the end of the week preceding the promotion week the stores receive the majority of the products for the entire promotion week. Thanks to the safety stocks at the DC and at the manufacturer, the stores are able to order additional products on a daily basis during the week if the promotion is very successful. Due to the flexibility in this replenishment process, the forecasting (in) accuracy at the national and weekly aggregation level is a good first indicator of the potential to reduce total waste due to poor forecasting. In future research, the forecasting (in) accuracy can also be measured at the daily store aggregation level, which will give an even better indication. The products from the following four product categories were included in this dataset: Desserts (like vanilla custard and yoghurt), Dairy Drinks (like drink yoghurt or milk with a flavor), Cold Cuts (like ham and liver pate) and Salads (e.g. Russian salad and Egg salad). All products are prepackaged. Fig. 1 shows some examples of the four product categories. Table 1 reports a summary of the descriptive statistics for this dataset. The dataset contains 407 different perishable items. The Dairy Drinks category contains the largest number of items (182 SKU's), the Cold Cuts category the lowest (48 SKU's). In total, there were 1447 promotions in 86 weeks for 407 perishable SKU's. So on average there were 16.8 SKU's on promotion in a week and an individual SKU is promoted on average once every 24.2 weeks. The average regular price per item was close to €1.50. When an item was on promotion, the median price discount was close to 30%, but could vary between 0% (no discount) and 60%. The price discount is calculated by comparing the price during the promotion with the regular sales price in the week preceding the promotion week. The success of a promotion was measured by the Lift Factor, which was defined as the Sales during a promotion divided by the baseline sales. Baseline sales were measured by taking the average weekly sales of the non-promotional weeks during the five weeks preceding a promotion. The average Lift Factor was highest for Desserts (13.8) and approximately equal for Dairy Drinks, Cold Cuts and Salads (5.6, 6.2 and 7.5). The Range for the Lift Factor was very large: it varied from 1.1 to 59.5. The average Shelf Life was close to 10 days for Desserts and Dairy Drinks, while it was 25.2 days for Cold Cuts and 16.1 days for Salads. The average baseline sales, i.e. the average total number of consumer units (per SKU) sold in all stores together, ranged from 718 (Desserts) to 1010 (Cold Cuts) consumer units per week.

2. Research environment and data description The empirical data needed for the analyses and tests in this paper were provided by a Dutch grocery retail chain. They operate more than 100 supermarkets in The Netherlands and have a strategic focus on providing good quality fresh products to their customers. With this strategy they want to differentiate themselves from their competitors. In this retail chain each store is supplied by one DC. The data provided by this retailer include: data on sales quantities, prices, product information, weight or volume per consumer unit, timing of promotions, additional promotion action (like a Saving Action where consumers can save credits/points for buying special products at reduced prices), and information whether the product was on display in the store or included in the store flyer. The promotions typically run one week (simultaneously in all stores) and the sales data are also weekly data (from week 2 in 2010 till week 35 in 2011) with each sales week covering exactly one promotion week (Sunday to Saturday). The data were weekly SKU data aggregated on national level (i.e. summed over all stores). We use sales data rather than demand data assuming that out-of-stocks make little difference. This assumption is motivated by the flexibility in the supply chain.

Fig. 1. Examples of products in the categories Desserts, Dairy Drinks, Cold Cuts and Salads.

K.H. van Donselaar et al. / Int. J. Production Economics 172 (2016) 65–75

Table 1 Summary of descriptive statistics for the promotions used.

# of SKUs # of promotions Average price [€/CU] Average price discount [%] Range price discount [%] Average Lift Factor Range Lift Factor Average Shelf Life [days] Range Shelf Life [days] Average baseline sales

Desserts

Drinks

Cold Cuts

Salads

182 627 1.35 30.3 0–60 13.8 1.1–59.5 9.8 5–40 718

70 314 1.36 26.4 10–56 5.6 1.3–20.4 9.4 4–14 949

48 119 1.71 33.5 0–56 6.2 1.4–39.4 25.2 10–42 1010

107 387 1.56 32.9 0–58 7.5 1.3–34.9 16.1 11–27 854

The Dutch retailer expressed the need for a solution which was relatively easy to understand, to implement and to maintain. As a result, we considered only parametric models in this study.

3. Literature review On the first research question to be addressed in this paper-how large are the threshold and saturation levels for perishable items-there is hardly any literature available. Since the call from Blattberg et al. (1995) for more empirical research on the Deal Effect Curve, only four papers have been published. A comparison of these papers is given in Table 2. Two of these papers Gupta and Cooper (1992), as well as Marshall and Leng (2002) were based on surveys among students. In these surveys the Consumer Purchase Intention was measured when different price discounts were offered. In both studies the number of products was very limited (4 products respectively 2 products and 2 services) and no perishable items were investigated. The other two papers (van Heerde et al., 2001; Martínez-Ruiz et al., 2006) used scanner data from retailers to measure the impact of price discounts on unit sales. van Heerde et al. (2001) studied a very large number of promotions ( 46500) for 11 non-perishable products from 3 retail chains. Martínez-Ruiz et al. (2006) studied 11 products using data from 52 promotion events in one supermarket. They have been the first to empirically investigate the threshold and saturation effects for perishable items. The perishable items in their research project were five different brands for the same product: a 125-g yoghurt. Our study extends the existing literature by being the first study which empirically investigates the threshold and saturation levels for a wide set of 407 perishable items, which not only vary in brand, but also for example in product category, flavor, packaging size, product type, price and shelf life. By studying these effects for such a wide variety of items, our results are far more generic than the results reported in the literature so far. As mentioned, the only other paper on threshold and saturation levels for perishable items studied five varieties (brands) of one specific perishable product: 125-g yoghurt. To address our second research question-how to forecast sales during a promotion-we used results from different streams in the Marketing and Operations Management literature. According to the review paper by Fildes et al. (2008), only a few studies in the OR and OM journals were devoted to this topic, while it has received more attention in the Marketing literature. The first two studies on forecasting promotional sales from the retailer perspective were based on econometric models (Cooper et al., 1999; van Heerde et al. 2002). The linear regression model developed in our paper is similar to these models as it aims to statistically estimate the effects of multiple independent variables on the sales quantity during a promotion. Before discussing in more detail which variables to include in these models we like to note that when implementing these types of

67

models in practice, their performance might be further improved by also taking into account expert amendments. In the literature on judgmental forecasting several papers found evidence via experiments (Goodwin and Fildes, 1999) or field observations (Lee et al., 2007; Fildes and Goodwin, 2007; McIntyre et al., 1993) that forecasters look for past promotions which have similar circumstances as the promotion they have to forecast and then use the actual sales of those past promotions as a basis for their forecast. Recent research (Trapero et al., 2013) has analyzed the behavior and the added value of judgmental adjustments in the presence of promotions. The Marketing literature offers insight into which particular independent variables should be included in the forecasting model. Clearly the relative price discount is an important variable (see e.g. Blattberg et al. (1995), van Heerde et al. (2002) and Cooper et al. (1999)). van Heerde et al. (2002) also includes the default price level (i.e. the regular price if there is no promotion). Apart from price effects, the effects of advertisement have been studied extensively in the literature. Advertisement can be divided in in-store advertisement and other advertisement like using a newspaper or a store flyer. In-store advertisement focuses on attracting customers to the promoted articles in several ways, the most commonly used methods make use of ads and displays (e.g. Cooper et al., 1999). The effects of displays were intensively investigated in the literature. A general conclusion is that sales can increase several fold in the presence of displays (e.g. Ailawadi et al., 2009). Making use of a store flyer is another commonly accepted determinant of promotional sales. This variable is also included in the model of Cooper et al. (1999). Gijsbrechts et al. (2003) specifically examined the effects of store flyers on store traffic. Their research shows that the use of a store flyer increases store traffic and that the presence of food and wine promotions on the front page of the flyer is most effective. In their paper Cooper et al. (1999) stressed the impact of the baseline sales: slowmovers typically have higher Lift Factors than fastmovers. Cooper et al. also included dummy-variables for the main holidays in their regression model. This is in line with the results of Divakar et al. (2005) who argue that during holidays the demand for beverages is largely increasing, while for other productgroups holidays negatively affect demand. Other variables that were known to have an impact on the success of a promotion are the number of items in a category that are on promotion (e.g. Ailawadi et al., 2006) and the package size or bulkiness as measured by the weight or volume of a product (Blattberg et al., 1981; Litvack et al., 1985; Raju, 1992). Narasimhan et al. (1996) introduced the ability-to-stockpile scale, which was later on (in a modified form) used by Bijmolt et al. (2005) and Pauwels et al. (2007). According to the definition of Narasimhan et al. (1996), the set of items that cannot be easily stockpiled includes two types of items, namely infrequently used products (such as rug cleaner and furniture polish) as well as perishable items. Therefore we prefer to separate these two types of items by using Baseline sales as an indicator for the frequency with which items are being sold and Shelf Life (the average number of days left to sell the item when it arrives in the store) as an indicator for the perishability. The only paper known to investigate the impact of remaining shelf life on willingness to pay is by Tsiros and Heilman (2005). They consider for a given product how the willingness to pay changes if the shelf life as perceived by the customer decreases. Since promotion forecasts were made before the goods were available in the store, no detailed information on the age distribution of the inventories on consecutive days during the promotion was available. As a result no data on shelf life as perceived by the customer were available and we had to use the shelf life as defined above as an indicator for perishability. There is one single study that explicitly has investigated the impact of multiple merchandising and temporary promotional activities on the sale of perishable products at the retailer (Curhan, 1974). But after that, there was a more than three and a half

68

K.H. van Donselaar et al. / Int. J. Production Economics 172 (2016) 65–75

Table 2 An overview of research on threshold and saturation effects since 1995. Gupta and Cooper

Marshall and Leng

van Heerde et al.

Martinez-Ruiz et al.

van Donselaar et al.

Type and source of data

Survey among 290 students

Survey among 183 students

Daily scanner data from 1 supermarket; 1 year

Nbr of promotions Nbr and type of products with promotions

208 4 products:

732 2 products and

Weekly store-level scanner data from 3 retail chains 46500 11 products:

52 11 products:

Weekly national level scanner data from 1 retail chain; 86 weeks 1447 407 products:

[ ¼ store-image (2) *brand(2)] aerobic shoes

2 services:

3 6.5-ounce canned tuna;

6 brands 250 g-Coffee

182 Desserts;

e.g. handphone and haircut

6 beverages;

5 brands 125 g-Yoghurt

70 Dairy Drink;

0 Unit Sales

5 Unit Sales

0–50%

1–32%

48 Cold Cuts; 107 Salads 407 Ln(Unit Sales/Baseline sales) 0–60%

2 packaged food Nbr of perishable products Dependent variable Discount size

0 Consumer Purchase Intention 0–70%

0 Consumer Purchase Intention 0–70%

decade radio silence on this topic. Curhan (1974) focused on four fruit and vegetable categories and estimated the impact of price reduction, newspaper advertising, display space and display location quality. He found that only the display has a significant positive impact on sales for all four categories. High levels of advertising and quality of the display location only had a significant positive impact on two categories: Hard Fruit and Cooking Vegetables, while price reduction only had a positive impact on one of the four categories: Soft Fruit. An important notion here is that Curhan used a 10% price reduction in his experiments. In our paper we will see that retailers nowadays use much bigger price discounts (up to 60%) and these effects of price discounts may be easier to identify when not restricting to small price discounts only. Apart from the effects mentioned so far, Curhan investigated whether the absolute price level and the sales volume made a difference in the impact of the merchandising and promotional activities. Although Curhan investigated the significance of the impact of the merchandising and promotional activities on sales, he did not aim to use this knowledge to develop an integral model which can be used to forecast the demand for perishable items. Moreover, it is interesting to see whether 40 years later his observations, for example on the limited impact of small price discounts, also hold for other perishable product categories. While the list of independent variables above is long, we do not pretend it is exhaustive. For a number of variables mentioned in the literature, data are difficult or impossible to get (e.g. promotions from competing retailers) and for other data there were no records available at the retailer who provided the empirical dataset (e.g. data on the quality of the display location).

4. Modeling the impact of temporary price discounts According to Cooper et al. (1999) the most widespread industry approach to promotional analysis is to compare the baseline sales to the lift, which is the promotional sales during previous promotions divided by the baseline sales during those promotions. In industry this lift is often determined at a high aggregation level (e.g. per category) due to the limited amount of promotions for a specific item. In our analyses we used the natural logarithm of a Lift Factor (Ln(LF)) as the dependent variable. A major advantage of using relative sales rather than absolute sales as the basis for the dependent variable is the fact that this results in standardized values for all promotions, making comparisons between promotions for different products more meaningful and easier. Probably this is also one of the reasons why this approach is relatively popular in industry. In contrast to the most

widespread industry approach, we measured the Lift Factor per item rather than per category, which was possible since we combined the promotional effects from a large number of items in a single regression model to determine the coefficients of the independent variables. This enabled us to forecast the Lift Factor relatively accurately despite the fact that only a few observations are available per item. If there is a positive threshold and saturation level, this implies a non-linear relationship between the relative price discount and Ln(LF). To acquire insight into this non-linear nature of this effect, we added five dummies as independent variables in the model, reflecting the following price discount intervals: 0–10%, 10–20%, 30–40%, 40–50% and 50–60%. The default value is a price discount between 20% and 30%. In this way we used the same incremental steps in the relative price discounts as Gupta and Cooper (1992) and Marshall and Leng (2002). Discretizing continuous data into bins has its limitations. For example, through this discretization some information on the actual price discount is lost and the degrees of freedom in the regression increases which may result in overfitting. Therefore we recommend considering our results on the threshold and saturation level in Section 5 as explorative. If the threshold and saturation effects indeed are very large, this should have a strong impact when include these effects in a promotion forecasting model. The relevance of the potential threshold and saturation levels for the perishables in our study can be tested when we validate the results from different forecasting models by including or excluding variables which capture these threshold and saturation levels. This will be discussed in detail in Section 6. Apart from the dummy price discount variables, many variables that were reported in the previous section were also included as independent variables in the regression model. To avoid multicollinearity some potential independent variables which are strongly related to the variables mentioned above have been excluded from the model, e.g. the absolute price discount or the default price per unit of weight (or volume). For some of the variables included in the model we used the natural logarithm of the variable. This results in the following model:LnðLF it Þ 5 P b1k Ditk þ b2 Weight Voli þb3 LnðShelf Lif ei Þ þ b4 #itemsit þ ¼ b0 þ k¼1

b5 LnðDef aultpriceit Þ þ b6 Displayit þ b7 Flyer it þ b8 FrontPageit þ 4 P b11;j Holidayjt þ b12 LnðBaselineSalesit Þ b9 BackPageit þ b10 Saving it þ þ uit where LF it Ditk

j¼1

is the Lift Factor (¼Sales during a promotion/baseline sales) for item i in week t is an indicator variable, capturing whether the relative price discount level for item i in week t is in the interval

K.H. van Donselaar et al. / Int. J. Production Economics 172 (2016) 65–75

0–10% (Dit1 ¼ 1), 10–20% (Dit2 ¼ 1), 30–40% (Dit3 ¼ 1), 40– 50% (Dit4 ¼ 1) or 50–60% (Dit5 ¼ 1). WeightVoli is the Weight (in kilogram) or the Volume (in liters) of one consumer selling unit for item i Shelf Lif ei is the average number of days item i can be sold to consumers, measured at the moment the item arrives in the store #itemsit is the number of items in the category of item i which are on promotion in week t Def aultpriceit is the regular sales price for item i (i.e. when item i is not on promotion) in the week preceding week t Displayit is an indicator variable, capturing whether item i was put on a display in the stores in week t (1) or not (0) Flyer it is an indicator variable, capturing whether item i was present in the store flyer in week t (1) or not (0) FrontPageit is an indicator variable, capturing whether item i was present on the front page of the store flyer in week t (1) or not (0) BackPageit is an indicator variable, capturing whether item i was present on the front page of the store flyer in week t (1) or not (0) Saving it is an indicator variable, capturing whether for item i in week t consumers received savings points as part of a Savings Action Programme (1) or not (0) Holidayjt is an indicator variable capturing whether week t is a holiday (1) or not (0) for SpringBreak (j¼1), Eastern (j¼2), Summer (j¼ 3) or Christmas (j ¼4). Baseline Salesit is the Baseline Sales for item i, measured as the average non-promotional sales during the five weeks immediately preceding week t uit is a disturbance term for item i in week t

5. Results on threshold and saturation levels for perishable items To examine the threshold and saturation effects, we follow a 2step approach. First, in this section we estimate the of the regression model described in the previous section to check if this model supports the presence of threshold and saturation effects. The second step – to be performed in the next section – is the true litmus test for these effects. We will include the potential presence of threshold and saturation effects in a forecasting model and evaluate whether these models outperform models which do not include these effects. In this section we will check for each product category whether there is a potential threshold level by analyzing whether there is a set of sequential discount classes. We start with the lowest discount class for which we have observations, for which the regression coefficients do not differ significantly. Likewise, for the potential saturation level, we will analyze whether there is a set of sequential discount classes, starting with the highest discount class for which we have observations, for which the regression coefficients do not differ significantly. To study these potential threshold and saturation effects we will evaluate multiple variations of the regression model. These models differ only in the selection of the default price discount class. In our base regression model we select the 20–30% price discount class as being the default discount class. Below we describe for this base model the detailed results.. The coefficients of the base regression model were estimated separately for each of the four product categories using the data from 1062 promotions in the calibration period (the first 62 weeks of observations in the dataset). The adjusted R2 values are 68.2% (Desserts), 71.4% (Drinks), 87.4% (Cold Cuts) and 62.5% (Salads). So the average adjusted R2 value is equal to 72.4%, showing that a

69

large part of the variation in the effect of promotional activity is explained by these models. This compares favorably with the average R2 value equal to 52.0% (respectively 43.4%) reported in (Martínez-Ruiz et al., 2006) for the parametric additive (respectively multiplicative) models for 5 brands of 125-g yoghurts. To check for the presence of multicollinearity, we calculated the Variance Inflation Factor (VIF) statistic. A maximum VIF value in excess of 10 is frequently taken as an indication that multicollinearity may be unduly influencing the least squares estimates (Neter et al., 1996). Here multicollinearity is no cause for concern since the maximum VIF values are equal to 2.12 (for Desserts), 2.32 (Drinks), 2.66 (Cold Cuts) and 2.56 (Salads). The assumptions of normally distributed disturbances and homoscedasticity have also been checked and these assumptions are met. The Durbin Watson statistics, frequently used to measure whether the assumption of uncorrelated residuals are met, are equal to 1.18 (Desserts), 1.24 (Drinks), 1.48 (Cold Cuts) and 1.46 (Salads). These values differ substantially from 2, strongly indicating that the residuals are autocorrelated. This problem does not bias the coefficients of the estimate, but does have an impact on the standard errors. As a result there may be results that seem to be statistically significant but are not. A standard solution to this is the Prais–Winsten method, which is also applied by e.g. Kleijnen and Van Schaik (2011) and Obermaier (2012). This method is a generalized least squares (GLS) method for estimating a regression equation whose errors follow a first order autoregressive process. The results of these GLS regressions are summarized in Table 3. The adjusted R2 values now vary between 63.1% and 86.1%. When using the Prais–Winsten method, the Durbin Watson statistics are all acceptable: 2.13 (Desserts), 2.08 (Drinks), 2.00 (Cold Cuts) and 2.02 (Salads). In the remainder of this section all discussions will be based on the regression model findings that result from the application of the Prais–Winsten method. When we have a closer look at the results, we note that Ln(Base) and Flyer are among the most influential independent variables (having high standardized coefficients) for all four product categories. At the same time there are also independent variables like ‘# items on promotions in category’, Ln(ShelfLife) and Saving Action which are very influential within one particular product category but not statistically significant in other product categories. The latter observation implies that to some extent consumers react differently on promotions for different product categories. Finally, note that a smaller Shelf Life results in a larger Lift Factor for Desserts, Dairy Drinks and Salads (see the negative regression coefficients for Ln(ShelfLife) which are statistically significant at p o.05). This is a first indication that a smaller Shelf Life, while implying a lower ability to stockpile, does not necessarily imply lower sales during promotions. To investigate threshold and saturation effects we take a closer look at the coefficients for the price discount dummies in the base regression model. If there were no threshold and saturation levels, the coefficients for the price discount dummies should increase when the relative price discount increases. In Fig. 2 we have visualized the regression coefficients for the price discount dummies. The coefficients for the 20–30% price discount class are equal to zero since this is the default discount class in our base regression model. The observations for Desserts, Cold Cuts and Salads all start with the 0–10% discount class, while the observations for Dairy Drinks start with the 10–20% discount class since there are no promotions for the 0–10% discount class for Dairy Drinks in our database. Fig. 2 shows for all four product categories that the regression coefficients in the first two discount classes with observations are proportionally increasing, suggesting no threshold effects. In addition, Table 4 shows the results (p-values) for testing the equality of regression coefficients of dummies that reflect relative price discount classes. These tests confirm that for Desserts and Salads the difference between the regression coefficients for the 0–10% dummy and the 10–20% dummy

70

K.H. van Donselaar et al. / Int. J. Production Economics 172 (2016) 65–75

Table 3 Regression results (unstandardized (B) and standardized (Beta) coefficients and adjusted R2 ) for the four product categories. Desserts R2-adj: 0.675

(Constant) WeightVol Ln(Shelf life) #items on promotion in category Ln(Default Price) 0–10% 10–20% 30–40% 40–50% 50–60% Display Flyer Front page Back page Saving action Spring Break Easter Summer Christmas Ln(BASE)

Dairy Drinks R2-adj: 0.656

B

Beta

4.606 -.139 -.346** -.008 -.482** -.646** -.041 .270** .564** 1.105** .388** 1.019** -.177 -.703 .223* -.011 .346 .003 .033 -.428**

-.043 -.108 -.032 -.198 -.164 -.017 .121 .224 .335 .202 .340 -.027 -.054 .062 -.002 .040 .001 -.003 -.417

Cold Cuts R2-adj: 0.861

B

Beta

3.67 .032 -.944** .021* .011

.019 -.409 .111 .006

-.087 .193** .611** .697** -.078 1.181** -.506*

-.069 .158 .209 .354 -,067 .609 -.083

.151* .123 1.032** .128 .326 -.213**

.094 .041 .235 .076 .074 -.411

Salads R2-adj: 0.631

B

Beta

B

Beta

2.976 .832* .140 -.025 -.261 -.114 .075 .366** .508** .945** .248* .546**

.115 .078 -.044 -.085 -.039 .037 .184 .167 .545 .123 .364

4.555 1.061** -.371* -.031** -.424** -.543** -.334** .132** .219** .082 -.003 .655** .486

.233 -.111 -.292 -.257 -.238 -.191 .128 .146 .046 -.002 .291 .074

.570** .247

.196 .059

.027 .247

.017 .053

-.034

-.009

.065

.033

-.418**

-.567

-.328**

-.584

Table 4 Results (p-values) for testing the equality of regression coefficients of dummies that reflect relative price discount classes.

Fig. 2. Regression coefficients for discount dummies in the base regression model, when using the Prais–Winsten method.

is statistically significant at po.05 (see first row in Table 4). For Cold Cuts, the difference between the 0–10% and the 10–20% discount class is not statistically significant, but this may be due to the low number of observations in these classes (only 5 and 10 observations). Only for Dairy Drinks the test shows that the difference between the regression coefficients for the 10–20% and the 20–30% discount classes is not statistically significant at po.05 (p¼0.136), which may imply a threshold effect for Dairy Drinks. However, as mentioned before, the true test on threshold and saturation effects is in the evaluation of the forecast accuracy when these effects are included in the forecasting models in the next section. When studying the saturation effects we compare the last two discount classes. According to the last row in Table 4 and the regression coefficients in Table 3 no saturation effect is visible for Desserts and Cold Cuts: the difference in the regression coefficients is statistically significant and the coefficient for the 50–60% discount class is higher than for the 40–50% discount class. For Dairy Drinks the coefficient for the 50–60% class is also higher than that for the 40–50% class, but the difference is not statistically significant. This is probably due to the low number of observations in these two classes (only 3 and 4 observations). The difference between the coefficients for the 40–50% class and the 30–40% class for Dairy Drinks is positive and statistically significant suggesting that at least up to 50% there is no saturation level for Dairy Drinks.

Dummy comparison

Desserts

Dairy Drinks

Cold Cuts

Salads

0–10% vs. 10–20% 10–20% vs. 20–30% 20–30% vs. 30–40% 30–40% vs. 40–50% 40–50% vs. 50–60%

≤0.001 0.607 ≤0.001 0.001 ≤0.001

0.136 0.001 0.002 0.564

0.260 0.495 ≤0.001 0.354 0.005

0.040 ≤0.001 0.009 0.156 0.142

The fact that for Salads the regression coefficient for the 50–60% discount class (0.082) is even lower than for the 40–50% class (versus 0.219), although this difference is not statistically significant, suggests that this product category may be a special category. Interviews with retail replenishment specialists and manufacturers of salads revealed that the sales of salads are very dependent on whether the weather conditions are favorable for barbecues. As a result the uncertainty in promotional sales for Salads is relatively high and supermarket managers may act risk-averse and limit the lift factor they use, especially when the price discount is very high (and therefore their margins are very low), in order to avoid huge amounts of waste when the weather forecast was wrong. This reasoning is supported by the fact that for both the 40–50% and the 50–60% discount classes the regression coefficient for Salads is considerably lower than those for the other three product categories. For example, the regression coefficient for the 50-60% class for Salads is 0.082 versus 1.105, 0.697 and 0.945 for the other three product categories. In conclusion, our preliminary analyses above do not provide conclusive evidence for the presence of threshold and saturation levels for the four perishable product categories. According to Della Bitta and Monroe (1980), in practice managers believe that at least a 15% discount is needed to change customers’ purchase intentions. In the four scientific studies that empirically investigated threshold and saturation levels (see Table 2), threshold levels typically varied between 0% (no threshold) and 10% with a maximum of 14% for store brand aerobic shoes. The fact that Desserts does not show a threshold is in line with Martínez-Ruiz et al. (2006) observation who found no or only small threshold values for five different

K.H. van Donselaar et al. / Int. J. Production Economics 172 (2016) 65–75

brands of 125-gr yoghurts: they reported no threshold for one brand, a 5% threshold for three brands and a 10% threshold for one of the five brands. Hence, while for some non-perishable products a small threshold has been identified in earlier papers, the paper of Martínez-Ruiz et al. (2006) and the preliminary tests in our paper (for a much larger and diverse group of perishable products) suggest indeed threshold levels are often not present or negligible. The saturation levels found in these studies were 23–31% for aerobic shoes (Gupta and Cooper (1992)), 30–40% for products and 20–30% for services (Marshall and Leng (2002)), 0–40% for dry groceries (van Heerde et al. (2001)), and 5–17% for coffee and 8–25% for yoghurt (Martínez-Ruiz et al., 2006). Our preliminary analyses give little support for the presence of saturation levels for perishable products. These results differ from the saturation levels found in the other papers. This may have several causes: it may be due to the perishability of the items, due to regional differences in consumer behavior and/or retailers’ promotional strategies, due to the type of research method (a survey among students with consumer purchase intention as dependent variable versus using empirical sales data from retailers with the actual lift factor as dependent variable) or due to the more specific product being investigated (125-g yoghurt versus the Desserts category). At least one potential explanation for this comes from another observation we can make based on our empirical data: While items like Desserts on average are having a shelf life of 1,5 weeks, their sales during promotions are about 14 weeks of average regular sales. This suggests that the success of a promotion is more determined by the substitution effect (consumers switching between different products driven by the price discount and other factors) than by the restriction to the stockpile imposed by the short time-to-consume.

6. Results on promotion forecasting for perishable items The true litmus test for the presence of any threshold and saturation level is the improvement in the out-of-sample forecasting accuracy when we allow the forecasting models to take into account these effects. To this end, we construct and compare multiple models and methods to forecast demand during promotions in this section. In doing so, we first add one independent variable to the regression model in Section 4: the natural logarithm of the Lift Factor from the previous promotion for the same product, denoted by Ln(PrevLF). This variable was excluded from the analyses in Section 4 since it may capture part of the effects of the discount level and thereby disguise the true threshold and saturation effects. In their paper Cooper et al. (1999) suggested to include a similar variable, but added the condition that the most recent promotion for the same product should be used having ‘matching ad and display conditions’. Unfortunately ad and display conditions often vary, complicating the inclusion of this condition when the length of the calibration period is 62 weeks. If there are any threshold and saturation effects, it is sensible to incorporate these non-linear effects when constructing a model to forecast demand during a promotion. So far we modeled these nonlinear effects using five dummy variables for the relative price discount (see the model in Section 3). We note that the use of five dummy variables to capture these effects gives a higher risk of in-sample overfitting during the calibration phase. For this reason we focused on the out-of-sample forecast accuracy (during the validation phase) to judge the added value of incorporating potential threshold and saturation effects. We will compare the performance of this ‘dummy model’ with two alternative models: the ‘linear model’ and the ‘quadratic model’. The ‘linear model’ ignores the non-linear effects and simply replaces the five discount dummies by the relative price discount. The ‘quadratic model’ replaces the five discount dummies by both a linear and a quadratic function of the mean centered relative price discount. To avoid multicollinearity, the relative discount variable

71

was first mean centered before squaring it. To calculate the mean centered relative price discount we subtract the average relative discount from the relative discount for each promotion in our dataset, where the average relative discount is calculated per product category based on the data in the calibration period only. Apart from considering these three alternative ways of modeling the price discount variable, we will also consider two alternatives to estimate the regression coefficients in our models. In Section 6.1 we assume each product category was analyzed separately. As a result, we ran the regressions on a dataset containing only the promotions for a single product category. In Section 6.2 we assume the sizes of the regression coefficients to be equal for all product categories and we run a single regression model on the entire database containing the promotions for all four product categories. Combining the data for different products in a single regression is known as pooled regression. Cooper et al. (1999) and van Heerde et al. (2002) also applied pooled regression when forecasting demand during a promotion. We investigate in this paper which SKU's should be pooled together in order to get the most accurate forecasts. The goal in this Section is to test the forecasting accuracy of different regression models and forecasting methods. Like in Section 5, in all regression models to be reported in this Section all maximum VIF statistics are well below 10 showing that multicollinearity again is no cause for concern. The different forecasting models will be compared using three performance criteria: the Root Mean Square Error (RMSE), the Mean Absolute Percentage Error (MAPE), and the Average Bias. All three criteria are measured by subtracting the actual promotional sales from the forecasted promotional sales, rather than subtracting the actual from the forecasted (natural logarithm of the) Lift Factor. Reporting only RMSE has the disadvantage that it is an absolute measure. To put it in perspective (for better comparison with new models developed by other scientists in future), we also report the Average Promotion Quantity (determined by summing the consumer units sold during all promotions in the validation period and dividing it by the total number of promotions in the validation period). The calibration period for our models is 62 weeks and the validation period is 24 weeks. 6.1. Results on promotion forecasting: regressing per product category In this subsection we specified for each category separate coefficients in the regression model, using the data in the calibration period. The regression results for this model using five price discount dummies are summarized in Table 5. The adjusted R2 reported in the second row of Table 5 shows that adding Ln(PrevLF) as an extra independent variable has a huge impact on the explanatory power of the model. In comparison with Table 3 (without Ln (PrevLF)) the average adjusted R2 increased from 70.6% to 80.1%. The standardized coefficients (Beta) in Table 5 confirm the relative importance of including Ln(PrevLF) to the model. For Desserts the Beta for Ln(PrevLF) is equal to 0.526 which is by far the highest Beta for this product category. Other variables having a high Beta value for one or more product categories are ‘50–60%’ (which stands for the variable D5 as defined in the model in Section 4), Ln(Base), Flyer and #items on promotion in category. The coefficients for the two alternative models (based on the linear and the quadratic mean centered function of the relative price discount) were not reported here, since these coefficients itself provide little additional insights. Rather we focus on the difference in forecast accuracy between these three alternative models. Table 6 compares the forecasting accuracy during the validation period for each of the four product categories and for each of the three alternative ways to model the relative discount. The

72

K.H. van Donselaar et al. / Int. J. Production Economics 172 (2016) 65–75

Table 5 Results of regression analysis for four product categories. Desserts R2-adj: 0.819 B (Constant) WeightVol Ln(Shelf Life) #items on promotion in category Ln(Default Price) 0  10% 10–20% 30–40% 40–50% 50–60% Display Flyer Front page Back page Saving action Spring Break Easter Summer Christmas Ln(BASE) Ln(PrevLF)

2.169 .036  .172  .004  .335  .028  .150 .114 .340 .661 .152 .658 .204 .058 .273 .079 .193 .106  .296  .229 .538

Dairy Drinks R2-adj: 0.782 Beta

.013  .054  .022  .165  .008  .068 .058 .162 .240 .081 .198 .043 .004 .097 .032 .033 .042  .025  .212 .526

B

Cold Cuts R2-adj: 0.891 Beta

2.302  .036  .627 .017 .000

 .022  .285 .100 .000

 .058 .191 .495 .738  .041 .825  .280

 .051 .163 .166 .407  .038 .416  .039

.170 .125 .850 .173 .369  .132 .345

.110 .048 .203 .125 .088  .243 .342

forecasting accuracy is measured via RMSE, MAPE and Average Bias. Cells with the best performance are marked yellow. The linear model performs best in 6 cases, the dummies model in 4 cases and the quadratic model in 2 cases. Although the MAPE is still better in three out of 4 cases for the non-linear models, the deterioration in Average Bias is substantial when comparing the non-linear models with the linear model for 3 out of the 4 product categories. Especially for Cold Cuts, the Average Bias is relatively high for the nonlinear models (345 units and 347 units versus 115 units for the linear model). This is a bias which is more than 9% of the average promotion quantity for the non-linear models and 3% for the linear model. This 6%-increase in bias is too considerable to be ignored. Even for the linear model, the MAPE is still very high for Salads in the validation period. A possible explanation for this is that Salads are notoriously difficult to forecast according to industry experts. Especially during the barbecue season demand fluctuations are substantial. This may result in large out-of-stocks during a promotion, leading to a distortion of historical data (only sales are registered). As a result it may be difficult to accurately estimate the regression coefficients when using the Salads data only. In this way Salads may be different from other categories. We consider Salads to be a non-routine product category, which can benefit from using observations from routine categories. This reasoning is similar to the observations made by Chen and Boylan (2008) who compared methods for identifying seasonal indices at the individual product level or at the product group level. They concluded that noisy series can ‘borrow strength’ from the group. When selecting for each product the model with the lowest MAPE based on the data in Table 6, the average MAPE is equal to 26.0%. If we use the linear model for all four categories, the average MAPE is equal to 26.6%. 6.2. Results on promotion forecasting: one regression for all categories In this subsection we present the results when a single regression is run for all four product categories, implying that the regression coefficients are the same for the four categories. Table 7

B

Salads R2-adj: 0.710 Beta

B

Beta

2.616 1.115  .007  .051  .369  .060 .063 .379 .548 .891 .268 .468

.146  .004  .101  .116  .018 .028 .169 .173 .508 .142 .297

1.966 1.035 .120  .036  .338  .343  .242 .088 .185  .051 .072 .455

.238 .042  .450  .221  .137  .128 .082 .128  .036 .064 .225

.544 .166

.207 .048

.451 .126 .219

.075 .099 .051

 .194

 .061

.050

.035

 .320 .237

 .456 .215

 .216 .374

 .381 .377

shows the regression results when the dummies model, the linear model or the quadratic model is performed. At first sight it seems that the performance of these models is slightly worse with different regression coefficients for different product categories: The average adjusted R2 for the dummies model has decreased from 80.1% (based on Table 5) to 78.2% (based on Table 7). The results show that some dependent variables have a different impact for different product categories: in Table 3 we see for example that the variable Display is only significant for the product categories Desserts and Cold Cuts. At the same time we note that the similarities among the categories are larger than the differences, enabling the product categories ‘to learn from each other’: when comparing the forecasting accuracy results of Table 8 (where a single regression model is run for all four product categories together) and Table 6 (where specific regression models is run for each product category separately), we see indeed that the two product groups Cold Cuts and Salads have improved dramatically when a single regression is run: for the non-linear models the average bias reduces from þ9% to approximately  1% (of the average promotion quantity) for Cold Cuts and from þ15% to þ20% range to slightly more than þ 4% for Salads. Likewise the MAPE reduces from 21% to 17% for Cold Cuts and from 38% to 23% for Salads. For Cold Cuts and Salads it is best to use a quadratic model in which the regression coefficients are based on the data of all four product categories. For Dairy Drinks and (to a lesser extent) Desserts the opposite is true. The performance for Dairy Drinks decreases dramatically when using the same regression model for all categories. The Average Bias for example for the linear model increases from 0.2% (¼9.6/4246) to 16.5% (¼699.7/4246). For Desserts, the MAPE for the linear model is marginally better in Table 8 compared to Table 6 (21.4% vs. 21.7%), however the Average Bias for the linear model is considerably larger in Table 8 than in Table 6 (  3.6% vs.  0.7%). So for both Dairy Drinks and Desserts it is best to use a linear model in which the regression coefficients are determined separately for each product category. While Cold Cuts and Salads benefit from the observations in the other two categories, the opposite is true for Desserts and Dairy

K.H. van Donselaar et al. / Int. J. Production Economics 172 (2016) 65–75

73

Table 6 The forecasting accuracy during the validation period for each of the four product categories when using a separate regression model for each product category. Validation RMSE

MAPE

Average Bias

Average Promotion Quantity

Dummies Linear Quadratic Dummies Linear Quadratic Dummies Linear Quadratic

Desserts

Drinks

Cold cuts

Salads

2630.9 2331.3 2654.0 23.6% 21.7% 22.8% 95.0 –60.8 –164.7 8432

1719.6 1553.1 1613.3 22.5% 21.8% 21.0% 107.7 9.6 70.7 4246

798.8 954.9 774.3 21.1% 22.2% 21.6% 344.5 114.6 347.1 3767

2480.5 2758.8 2673.2 37.9% 39.3% 39.3% 739.1 927.1 901.0 4705

Drinks. The reason that Salads benefits from the promotional data from the other product groups may be that it is considered a nonroutine product category. As explained in the previous subsection, since the historical demand data for Salads are distorted during the high-season, Salads may benefit from additional more accurate data from routine product categories having ample observations. The reason that Cold Cuts benefits from observations in other product categories may be due to its limited number of observations data: only 92 out of the 1064 promotions in the calibration period are promotions from the Cold Cuts category. When summarizing the results above, we conclude that the best forecasting models for the four product categories are a quadratic model in which the regression coefficients are based on the data of all four product categories for the product categories Cold Cuts and Salads, and a linear model in which the regression coefficients are determined separately for the product categories Desserts and Dairy Drinks. The fact that the dummies model never turned out to be the best forecasting model suggests that modelling threshold and saturation effects led to worse forecasting performance than modeling price reductions linearly or quadratically. The resulting performance when the best forecasting model is selected for every product category is summarized in Table 9. The last column reports the average of the absolute values of the performance of the four product categories. The resulting average MAPE is 21.0%. When comparing this with the 26.0% MAPE reported in subsection 6.1 where a regression was performed for each product category separately, we conclude that using observations from other product categories can substantially improve the forecasting accuracy. To get an additional perspective on the performance of the best fitting regression models, we compare the results in Table 9 with the performance of the best ‘Moving Average Lift Factor’ (MALF) forecasting method. The forecast in a MALF(N) method for SKU A in week t is defined here as the average of the Lift Factor in the most recent N promotions prior to week t (or the total number of promotions prior to week t if this number is less than N) times the baseline sales of SKU A in week t. Note that in case N is equal to one, this forecasting method is a simple last-like forecasting method which only uses the Lift Factor of the most recent promotion for the same product and updates this by multiplying it with the most recent baseline sales. Since Hoch and Schkade (1996) found that forecasters who use an average of past observations rather than a single observation improved their forecasts in an unpredictable environment, we also consider values of N larger than 1. In this way our benchmark resembles the benchmark used by Ali et al. (2009). The optimal values for N for our four product categories varied from 3 to 5, which is in line with the observation made by Hoch and Schkade (1996) about the added value of using multiple

observations. The results on the size of the forecast errors are presented in Table 10. When comparing these with the results in Table 9, we note that the RMSE of the best regression models is 30% lower than the RMSE of the best MALF method, while the MAPE is reduced from 28.6% to 21.0% when using the best regression models. So although developing and selecting the best regression model per product category is relatively complex and time-consuming, the benefits are very large too. To facilitate performance comparison with future studies we also report here the volume weighted MAPE, defined as the sum of the absolute differences between the forecasted promotional sales and the actual promotional sales divided by the sum of the actual promotional sales. The volume weighted MAPE when considering the promotions for all four product categories during the validation period together is equal to 20.3% for the best regression model and 25.9% for the best Moving Average Lift Factor method.

7. Conclusions and future research The first research objective in this paper was to gain a more indepth understanding of the impact of temporary price discounts on the sales of perishable items. To this end we performed preliminary analyses on the potential threshold and saturation levels for the relative price discount for four important perishable product categories: Desserts, Dairy Drinks, Cold Cuts and Salads, using a large empirical dataset from a Dutch retailer operating more than 100 stores. For these analyses we run a regression model with five dummy variables for different price discount classes to capture potential threshold and saturation effects. These preliminary analyses do not provide conclusive evidence for the existence of threshold and saturation levels for these product categories. One potential explanation for the lack of support for saturation levels in our study is that earlier research mostly focused on nonperishable products while perishable products are different in nature. For example the consumer's willingness to substitute may be considerably higher for perishables than for non-perishables (van Woensel et al., 2007) implying larger saturation levels (if any at all). The fact that in our study for example Desserts sell on average 14 times more during a promotion week than in a nonpromotion week, while the average Shelf Life is only 1.5 weeks suggests that the success of a promotion is more determined by the substitution effect (consumers switching between different products driven by the price discount and other factors) than by the restriction to stockpile imposed by the short time-to-consume. Future research is needed to further examine this. The second research objective was to construct and evaluate different models to forecast demand during promotions for perishable products. The analyses on demand forecasting models for promotions have resulted in two conclusions:

74

K.H. van Donselaar et al. / Int. J. Production Economics 172 (2016) 65–75

Table 7 Regression results for four product categories when Ln(PrevLF) is added to the model in Section 3 and a single regression model is run for the four categories. Dummies model R2-adj: 0.782

(Constant) WeightVol Ln(Shelf Life) #items on promotion in category Ln(Default price) 0–10% 10–20% 30–40% 40–50% 50–60% %Discount %Discount cntrd Quadr %Discount cntrd Display Flyer Front page Back page Saving action Spring break Easter Summer Christmas Ln(BASE) Ln(PrevLF) Dairy dessert Cold cuts prepackaged Salads prepackaged

Linear model R2-adj: 0.784

Quadratic model R2-adj: 0.786

B

Beta

B

Beta

B

Beta

1.986 .010 –.162 –.029

.004 –.088 –.197

1.592 .016 –.144 –.029

.007 –.078 –.196

1.973 .023 –.156 –.029

.009 –.085 –.200

–.270 –.171 –.117 .134 .334 .390

–.126 –.041 –.055 .077 .146 .172

–.283

–.132

–.268

–.125

1.454

.236 1.474 1.926 .136 .518 .403 .690 .193 .088 .221 .088 –.105 –.182 .505 .413 .258 .282

.235 .051 .085 .205 .060 .050 .085 .022 .030 .039 –.010 –.210 .490 .279 .099 .169

.138 .511 .399 .654 .184 .086 .248 .084 –.116 –.185 .511 .361 .197 .191

.086 .203 .060 .047 .081 .021 .034 .037 –.011 –.213 .496 .243 .075 .115

.134 .520 .437 .686 .192 .087 .225 .089 –.086 –.183 .501 .366 .189 .187

.084 .206 .066 .050 .085 .021 .031 .039 –.008 –.211 .486 .247 .072 .112

Table 8 The forecasting accuracy during the validation period for each of the four product categories when using the same regression coefficients for all product categories. Validation RMSE

MAPE

Average Bias

Average Promotion Quantity

Dummies Linear Quadratic Dummies Linear Quadratic Dummies Linear Quadratic

Desserts

Drinks

Cold Cuts

Salads

2478.6 2349.3 2409.2 22.1% 21.4% 21.6% –156.0 –301.3 –338.0 8432

1471.6 1571.7 1680.3 25.3% 26.2% 26.8% 654.4 699.7 737.8 4246

1217.4 1238.5 1176.9 17.7% 18.0% 17.2% –55.0 –18.8 –13.9 3767

1677.8 1646.2 1658.5 23.9% 23.4% 23.3% 206.3 211.0 196.3 4705

1. Modeling threshold and saturation effects has led to worse forecasting performance in comparison to models based on linear or quadratic functions of the relative price discount. 2. The largest improvement in forecast accuracy comes from the distinction between product categories with routine and nonroutine demand processes and between product categories with few and many observations when forecasting the promotional sales of perishable products. Regarding perishable products with non-routine demand processes or with limited number of observations, it turned out to be important to let these product categories ‘borrow strength’ from product categories with routine demand processes and a large number of observations. By running a regression for the first type of categories based on the entire dataset, i.e. including other product categories, the forecasting accuracy for the product categories with non-routine demand processes or with limited

number of observations increased substantially. In contrast, for the product categories with routine demand processes and ample number of observations, it proved to be better to run a regression for each product category separately. Further research is needed to determine the generalizability of the results on the threshold and saturation levels; it would be interesting to find out whether the results for Desserts, Dairy Drinks, Cold Cuts and Salads also apply for other perishable product categories (like Vegetables, Fruit, Fresh Meat and Bread) and for other retail chains in the same geographical area or in other parts of the world. As indicated by van Heerde et al. (2001), it is also of interest to find further solid explanations for (the lack of) threshold and saturation effects. The literature on promotion forecasting models also suggests additional ways to improve the forecast accuracy. This may be explored in future research. For example, Cooper et al. (1999) argue that the accuracy may be increased when the dataset is split into two or more subsets based on the Baseline sales, for each of

K.H. van Donselaar et al. / Int. J. Production Economics 172 (2016) 65–75

Table 9 The forecasting accuracy during the validation period for each of the four product categories when using the best regression model for each product category: i.e. using for the product categories Cold Cuts and Salads a quadratic model in which the regression coefficients are based on the data of all four product categories, while using for the product categories Desserts and Dairy Drinks a linear model in which the regression coefficients are determined separately. Validation

Desserts Drinks Cold Cuts Salads Average

RMSE MAPE Average Bias Average Promotion Quantity

2331.3 21.7%  60.8 8432

1553.1 21.8% 9.6 4246

1176.9 1658.5 17.2% 23.3%  13.9 196.3 3767 4705

1679.9 21.0% 70.1 5288

Table 10 The forecasting accuracy during the validation period for each of the four product categories when using the best Moving Average Lift Factor forecasting method for each product category. VALIDATION

Desserts

Drinks

Cold Cuts

Salads

Average

RMSE MAPE Average Bias

3794.8 26.4% 9.8

1671.9 26.4%  59.5

2117.6 32.3%  182.2

2107.7 29.3%  24.3

2423.0 28.6% 69.0

which then a separate regression is performed. This idea is particularly worthwhile to be investigated when performing a single regression analysis for all four product categories (see Section 6.2), because then sufficient data are available also after the dataset is split. Alternative potential criteria to split the database are based on Shelf Life or on Relative Price Discount. A second way to improve the forecast accuracy is the inclusion of some major interaction effects. For example we noted that weight or volume only was significant in the models for Cold Cuts and Salads. Likewise Shelf Life was only significant in the models for Desserts and Dairy Drinks. By adding the interaction between these variables in the single regression model for all product categories, the accuracy may be improved.

References Electronic information European Parliament, 2012. 〈http://www.europarl.europa.eu/news/en/pressroom/content/20120118IPR35648/ html/Parliament-calls-for-urgent-measures-to-halve-food-wastage-in-the-EU〉.

Articles Ailawadi, K.L., Harlam, B.A., César, J., Trounce, D., 2006. Promotion profitability for a retailer: the role of promotion, brand, category, and store characteristics. J. Mark. Res. 43, 518–535. Ailawadi, K.L., Beauchamp, J.P., Donthu, N., Gauri, D.K., Shankar, V., 2009. Communication and promotion decisions in retailing: a review and directions for future research. J. Retail. 85 (1), 42–55. Ali, O.G., Sayin, S., van Woensel, T., Fransoo, J., 2009. SKU demand forecasting in the presence of promotions. Expert Syst. Appl. 36 (10), 12340–12348. Bijmolt, T.H.A., Heerde, H.J. van, Pieters, R.G.M., 2005. New empirical generalizations on the determinants of price elasticity. J. Mark. Res. 42, 141–156. Blattberg, R., Eppen, G.D., Lieberman, J., 1981. A theoretical and empirical evaluation of price deals for consumer nondurables. J. Mark. 45, 116–129. Blattberg, R.C., Briesch, R., Fox, E.J., 1995. How promotions work. Mark. Sci. 14 (3), 122–132. Broekmeulen, R.A.C.M., van Donselaar, K.H., 2009. A heuristic to manage perishable inventory with batch ordering, positive lead-times, and time-varying demand. Comput. Oper. Res. 36 (11), 3013.

75

Chen, H., Boylan, J.E., 2008. Empirical evidence on individual, group and shrinkage seasonal indices. Int. J. Forecast. 24, 525–534. Cooper, L.G., Baron, P., Levy, W., Swisher, M., Gogos, P., 1999. Promocast: a new forecasting method for promotional planning. Mark. Sci. 18 (3), 301–316. Corsten, D., Gruen, T., 2003. Desperately seeking shelf availability: an examination of the extent, the causes, and the efforts to address retail out-of-stocks. Int. J. Retail Distrib. Manag. 31 (12), 605–617. Curhan, R.C., 1974. The effects of merchandising and temporary promotional activities on the sales of fresh fruits and vegetables in supermarkets. J. Mark. Res. 11 (3), 286–294. Della Bitta, A.J., Monroe, K.B., 1980. A multivariate analysis of the perception of value from retail price advertisements. Adv. Consum. Res. 8, 161–165. Divakar, S., Ratchford, B., Shankar, V., 2005. CHAN4CAST: a multichannel, multiregion sales forecasting model and decision support system for consumer packaged goods. Mark. Sc., 24; , pp. 334–350. Donselaar, K.H. van, Woensel, T. van, Broekmeulen, R.A.C.M., Fransoo, J., 2006. Inventory control of perishables in supermarkets. Int. J. Prod. Econ. 104, 462–4272. Ettouzani, Y., Yates, N., Mena, C., 2012. Examining retail on shelf availability: promotional impact and a call for research. Int. J. Phys. Distrib. Logist. Manag. 42 (3), 213–243. Fildes, R., Goodwin, P., 2007. Against your better judgment? How organizations can improve their use of management judgment in forecasting. Interfaces 37, 70–576. Fildes, R., Nikolopoulos, K., Crone, S.F., Syntetos, A.A., 2008. Forecasting and operational research: a review. J. Oper. Res. Soc. 59 (9), 1150–1172. Goodwin, P., Fildes, R., 1999. Judgmental forecasts of time series affected by special events: does providing a statistical forecast improve accuracy? J. Behav. Decis. Mak. 12, 37–53. Gijsbrechts, E., Campo, K., Goossens, T., 2003. The impact of store flyers on store traffic and store sales: a geo-marketing approach. J. Retail. 79 (1), 1–16. Gruen, T.W., Corsten, D.S., Bharadwaj S., 2002. Retail out-of-stock: a worldwide examination of extent, causes and consumer responses. The Food Business Forum. Gupta, S., Cooper, L.G., 1992. The discounting of discounts and promotion thresholds. J. Consum. Res. 19 (3), 401–411. van Heerde, H.J., Leeflang, P.S.H., Wittink, D.R., 2001. Semiparametric analysis to estimate the deal effect curve. J. Mark. Res. 38 (2), 197–215. van Heerde, H.J., Leeflang, P.S.H., Wittink, D.R., 2002. How promotions work: SCAN*PRO based evolutionary model building. Schmalenbach Bus. Rev. 54 (3), 198–220. Heller, W., 2002. Sales grow so does competition. Progress. Groc. 81 (14), 56–58. Hoch, S.J., Schkade, D.A., 1996. A psychological approach to decision support systems. Manag. Sci. 42, 51–64. Kleijnen, J.P.C., van Schaik, F.D.J., 2011. Sealed-bid auction of Netherlands mussels: statistical analysis. Int. J. Prod. Econ. 132, 154–161. Lee, W.Y., Goodwin, P., Fildes, R., Nikolopoulos, K., Lawrence, M., 2007. Providing support for the use of analogies in demand forecasting tasks. Int. J. Forecast. 23, 377–390. Litvack, D.S., Calantone, R.J., Warshaw, P.R., 1985. An examination of short-term retail grocery effects. J. Retail. 61 (3), 9–25. Marshall, R., Leng, S.B., 2002. Price threshold and discount saturation point in Singapore. J. Prod. Brand Manag. 11 (2), 147. Martínez-Ruiz, M.P., Mollá-Descals, A., Gómez-Borja, M.A., Rojo-Álvarez, J.L., 2006. Assessing the impact of temporary retail price discounts intervals using SVM semi-parametric regression. Int. Rev. Retail Distrib. Consum. Res. 16 (2), 181–197. McIntyre, S.H., Achabal, D.D., Miller, C.M., 1993. Applying case-based reasoning to forecasting retail sales. J. of Retail. 69, 372–398. Narasimhan, C., Neslin, S.A., Sen, S.K., 1996. Promotional elasticities and category characteristics. J. Mark. 60, 17–30. Neter, J., Kutner, M.H., Nachtsheim, C.J., Wasserman, W., 1996. Applied Linear Statistical Models, 4th ed. Irwin, Chicago. Obermaier, R., 2012. German inventory to sales ratios 1971–2005 – an empirical analysis of business practice. Int. J. Prod. Econ. 135, 964–976. Pauwels, K., Srinivasan, S., Franses, P.H., 2007. When do price thresholds matter in retail categories? Mark. Sci. 26 (1), 83–100. Raju, J.S., 1992. The effect of price promotions on variability in product category sales. Mark. Sci. 11 (3), 207–220. Srinivasan, S., Pauwels, K., Hanssens, D.M., Dekimpe, M., 2004. Do promotions benefit manufacturers, retailers or both? Manag. Sci. 50 (5), 617–629. Thron, T., Nagy, G., Wassan, N., 2007. Evaluating alternative supply chain structures for perishable products. Int. J. Logist. Manag. 18 (3), 364–384. Trapero, J.R., Pedregal, D.J., Fildes, R., Kourentzes, R., 2013. Analysis of judgmental adjustments in the presence of promotions. Int. J. Forecast. 29, 234–243. Tsiros, M., Heilman, C.M., 2005. The effect of expiration dates and perceived risk on purchasing behavior in grocery store. J. Mark. 69 (2), 114–129. van Woensel, T., Donselaar, K., van, Broekmeulen, R., Fransoo, J., 2007. Consumer responses to shelf out-of-stocks of perishable products. Int. J. Phys. Distrib. Logist. Manag. 37 (9), 704–771.