ARTICLE IN PRESS
Int. J. Production Economics 93–94 (2005) 479–491 www.elsevier.com/locate/dsw
The impact of aggregation level on forecasting performance Giulio Zotteria, Matteo Kalchschmidtb,, Federico Caniatoc a Dipartimento di Sistemi di Produzione ed Economia Aziendale, Politecnico di Torino, C.so Duca degli Abruzzi 24, Turin, Italy Dipartimento di Ingegneria Gestionale e dell’Informazione, Universita` degli Studi di Bergamo, Viale Marconi 5, Dalmine (BG), Italy c Dipartimento di Ingegneria Gestionale, Politecnico di Milano, P.za L. da Vinci 32, Milan, Italy
b
Abstract Most operations decisions are based on some kind of forecast of future demand. Thus, forecasting is definitely a very traditional area in the operations and inventory management literature. While literature concerning forecast explores the adoption of various qualitative and quantitative methods, this paper tries to design new solutions to improve forecasting accuracy by focusing on the forecasting process that uses such algorithms. In particular, when forecasting demand one should always make clear exactly what he/she is trying to forecast, in terms of the time bucket (i.e., the period of time over which demand is aggregated), the forecasting horizon, the set of items the demand refers to (e.g., forecasting demand for a single item can be much harder than forecasting demand for a group of items), the set of locations the demand refers to (e.g., demand at the single store level is much less predictable than the demand for a whole chain of stores). Traditionally, these features of the final output of forecasting also influence the forecasting process. Indeed, when one wants to forecast demand at single store single item single day level it seems natural to analyse demand and causal factors at the same level of aggregation. On the contrary, in this paper we aim at showing that, first of all often aggregating and/or disaggregating data in the forecasting process can lead to substantial improvements; second, the choice of the appropriate level of aggregation depends on the underlying demand generation process. In addition, most forecasting algorithms tend to focus on a single demand variable. On the contrary, we can analyse analogous time series to improve the effectiveness of the forecasting process. Clustering techniques can be used to identify such homologous time series. Such clusters of homologous time series can provide, on the one hand, the sample size required to gain good statistical confidence and, on the other hand, relatively homogeneous data. In the paper, we use sales data from a food retailer at a very detailed level to test our hypotheses. This claims for relevance for both practitioners and researchers. r 2004 Elsevier B.V. All rights reserved. Keywords: Demand forecasting; Cluster analysis; Aggregation level
Corresponding author. Department of Management and Information Technology, Universita` degli Studi di Bergamo, Viale
Marconi 5, Dalmine (BG) 24044, Italy. Tel.: +39-035-205-2360; fax: +39-035-562-779. E-mail address:
[email protected] (M. Kalchschmidt). 0925-5273/$ - see front matter r 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.ijpe.2004.06.044
ARTICLE IN PRESS 480
G. Zotteri et al. / Int. J. Production Economics 93–94 (2005) 479–491
1. Introduction Demand forecasting has been widely studied by researchers over the past decades. Many authors with rather different backgrounds (from inventory management to service operations, from econometrics to financial systems) produced a wide array of contributions. However, most papers share a common feature: they either propose a new technique (i.e., algorithm) or evaluate the performance of existing ones. When trying to implement forecasting techniques practitioners often find out that forecasting is much more complex than the simple design or selection of an appropriate algorithm and involves the choice of the relevant pieces of information, the design of information systems, the control of data quality, and the definition of managerial processes. This has pushed researchers to study also systems to properly define the forecasting process in terms of information requirements and principles to avoid implementation problems. We refer to Armstrong (2001) for a general review of these approaches and techniques. One critical issue concerning implementation and adoption of forecasting techniques is the choice of the appropriate level at which forecasts should be evaluated. Typically forecasting involves the prediction of future demand for a given ‘‘product’’ over a given time bucket for a given location. However, the definition of these three dimensions is all but trivial. E.g., when forecasting fashion items (e.g., shoes) one should make clear whether a product is a combination of a particular design, material, colour and size rather than a group of SKUs that share the same basic model. As to the location one should make clear whether the forecast is at store level rather than at distribution centre level or at chain level. The choice of the appropriate level of aggregation depends on the decision-making process the forecast is expected to support. E.g., for shortterm production planning probably a very detailed demand forecast is required, while for plant design or budgeting a rather aggregate forecast will be used. What makes the problem more complex in real contexts is that usually various levels of aggrega-
tion are adopted within a specific firm as various decision-making processes take place at the same time; thus, usually firms have to evaluate forecasts at different levels of aggregation. Indeed, some authors (Small, 1980) have shown that firms adopt different levels of forecasts and that this aggregation level is not directly tied to the specific firm’s characteristic (e.g., size, industrial sector and so on). This leads to the need to identify a proper forecasting process so that forecasts at different levels of aggregation are consistent and provide the required information to each single decisionmaking process (think for example of production planning vs. budgeting). This problem has stimulated research on the level of aggregation of forecasting process, which is often referred to as Hierarchical Forecasting. Hierarchical Forecasting is typically made up of two separate forecasting processes: (1) the bottomup forecasting process and (2) the top-down forecasting process (Muir, 1979). In the bottomup approach, individual forecasts for each demand segment (e.g., single SKU, single day, or single store) are combined to produce a forecast of aggregate demand (e.g., group of products, week or group of stores). This is referred to as the cumulative forecast since it is the sum of individual lower level forecasts. In the top-down process, aggregate demand data are used to forecast aggregate demand, then the aggregate forecast is disaggregated to produce what are known as derived forecasts for each demand segment. Typically, disaggregation is applied by means of historical data regarding the different segments, but some authors also provide other techniques to evaluate properly lower-level forecasts; we refer to Gross and Jeffrey (1990) for a general review of these techniques. Literature regarding Hierarchical Forecasting can be structured in two main areas. On the one hand many authors have focused on identifying whether the top-down approach outperforms the bottom-up one. Some authors (Theil, 1954; Grunfeld and Griliches, 1960; Schwarzkoph et al., 1988; O¨ller, 1989; Ilmakunnas, 1990; Kahn, 1998; Lapide, 1998) argue that the top-down approach is superior because of its lower cost and greater accuracy during times of reasonably
ARTICLE IN PRESS G. Zotteri et al. / Int. J. Production Economics 93–94 (2005) 479–491
stable demand. On the contrary, others (Orcutt et al., 1968; Zellner and Tobias, 2000; Weatherford et al., 2001) notice that, although it may be appealing to minimise the number of independent forecasts, individual forecasts are essential when it is important to capture differences among demand patterns (e.g. differences among stores and or among products). E.g., when different products have different seasonality, aggregation might make the estimate of seasonality rather tricky. Finally, some authors take a contingent approach and analyse the conditions that make one approach more effective than the other. Some authors (Mentzer and Cox, 1984a, b; Weatherford et al., 2001) have shown that the level of data aggregation affects forecasting accuracy and that it is relevant for firms to properly identify at which level they should evaluate predictions. In synthesis, these authors provide evidence that a priori there is no one best way. Miller et al. (1976), Barnea and Lakonishok (1980) and Fliedner (1999) show that the choice of top-down vs. bottom-up forecasting process depends on the degree of correlation among the subaggregate forecast variables and upon the magnitude of the correlation between forecast errors of subaggregate variables. Though the literature has devoted considerable attention to hierarchical forecasting, we argue that this issue deserves further attention. In particular we aim at providing the following contributions: (1) When comparing top-down (i.e., aggregate) with bottom-up (i.e., disaggregate) forecasting processes it is often appropriate to measure forecasting accuracy both at aggregate and disaggregate levels. This approach is consistent with the fact that often companies use both aggregate and disaggregate predictions to support different decision-making processes. In addition, the evaluation of forecasting performance both at aggregate and disaggregate levels makes the comparison between the top-down and bottom-up approaches more fair. Indeed, one might expect that measuring forecast accuracy at disaggregate level might make the bottom-up approach look better, and vice versa. This is not a common approach in past literature, however, we argue that, since it
481
can provide relevant contribution to understand performance, more attention should be paid to this issue. (2) The choice of the appropriate level of aggregation really depends on the demand generation process. Indeed, to accurately forecast demand one needs to estimate the drivers of demand fluctuations that are likely to depend on the single location and single SKU. E.g., different stores might have different seasonality and have clients with different price sensitivity; on the other hand some products (e.g., shoes) are likely to sell very well early in the selling season while others sell only toward the end of it. However, the need to estimate these factors generates a trade-off between the ability to capture differences among locations (e.g., stores) and SKUs (e.g., colours of shoes) and the ability to accurately estimate these differences. Indeed, as one tries to be very location and SKU specific (e.g., estimate the price sensitivity at store/item level) the number of parameters increases thus reducing the accuracy of the parameter estimates (given a set of information available). The choice of the proper position on this trade off depends on two factors: the availability of information (e.g., number of years of relevant history; number of transactions per period) and the degree of difference among locations/products. The more information is available the more one can choose to capture differences among locations/SKUs; the more the locations/products differ the more one is open to accept some inaccuracy in the estimate of parameters to capture such difference. E.g., in the past authors have designed a forecasting system for a fresh food manufacturer that sells to retail grocery chains. While retailers in a given area share a rather similar seasonality, they differ significantly as to their ability to increase demand during promotions. Thus the forecasting technique estimates only one seasonality factor for all retailers in a given area, while estimates at single retailer level the effect of trade promotions (see Caniato et al. (2002) for a detailed description of the problem and of the solution proposed).
ARTICLE IN PRESS 482
G. Zotteri et al. / Int. J. Production Economics 93–94 (2005) 479–491
(3) Finally, the literature usually aggregates demand according to the product and/or supply chain structure. E.g., when aggregating demand over stores, one is tempted to cluster stores that are served by a given distribution centre. Indeed, this is very consistent with the output managers are expecting. However, in the paper we will discuss a second option: clustering demand according to the degree of similarity of time series. In the above example, one can cluster stores according to whether they have similar demand patterns rather than according to their geographical proximity. We believe that clustering (if appropriately implemented) can on the one hand enable to capture differences among stores (e.g., in terms of price sensitivity) as the clustering procedure groups stores with similar demand patterns (e.g., with similar reaction to price changes); on the other hand clustering enables to have a relatively large sample for each cluster of stores. In these terms, clustering is capable of moving the previously mentioned trade-off towards more efficient solutions. The remainder of this paper is structured as follows: Section 2 presents the methodology adopted in this work and describes the data used to design the solution and to evaluate results. Section 3 will evaluate the different performances that the proposed approach can gain compared to pure bottom-up or top-down forecasting. Section 4 draws some conclusions and proposes further research directions.
2. Data and methodology To gain the objectives previously introduced we considered a real context; in particular, we considered data provided by a European grocery retailer. The analysis refers to a specific context; however, we believe and argue that results can be of general interest since the proposed approach is not specific to the context considered but rather typical in complex situations. The retailer manages a network of about 400 stores and owns distribution centres that receive goods from suppliers and
distribute them to single stores. In addition, the retailer manages several thousands SKUs as variety is believed to be one of the major drivers of traffic in grocery stores. In our study we were able to analyse demand for a set of 5 products (ranging from fast moving sodas to slow moving diapers) in a network of 38 stores for 117 weeks. The data were split into a fit sample of 111 weeks and a test sample of 6 weeks as 6 weeks is the forecasting horizon used by the retailer. Since our objective is to analyse the effect of aggregation levels we adopted a specific forecasting method developed according to the particular context considered. Therefore, we applied the same technique at different aggregation levels to evaluate only the contribution of aggregation towards forecasting performance. The forecasting technique applied is a logarithmic regression. In particular the conceptual model behind the forecasting approach is the n, p, q model that is often used in marketing literature (e.g., see Kotler, 1984) where n is the number of customers visiting a store, p is the probability that a customer buys a product and q is the quantity bought by a customer that actually purchases the product. Since this technique was actually adopted by the considered company and given our objectives, we decided to adopt this forecasting approach to test our hypothesis. We argue that, even if with different techniques results may vary, this does not harm the generalization of results. One interesting feature of this model is that it follows the demand generation process as it first captures the number of customers visiting a store and then the choice of the customers in the store. In this work, we will take into consideration only the estimation of the choice of customers, since, given our objectives, we do not need to evaluate the demand completely. In particular, we consider the estimation of PQt, which represents the number of units of a specific SKU a generic customer buys during a specific week. This enables us to use log regression as PQ is a positive statistic but it is not bound to be smaller than one.1 The 1
In addition, models where p and q were estimated separately were tested but they did not prove to be more effective than the model proposed in this paper.
ARTICLE IN PRESS G. Zotteri et al. / Int. J. Production Economics 93–94 (2005) 479–491
model is as follows: lnðPQt Þ ¼ a0 þ a1 lnðpt Þ þ a2 At þ a3 T t þ a4 HDIt þ a5 T t1 þ a6 HDIt1 þ a7 NTt þ a8 NTt1 þ a9 NTt2 X X X þ bt E t;t þ gt E t1;t þ dr Rir t
t
r
where pt is the average price of item j at store i in week t, At a dummy variable that captures the presence of a promotion for item j in week t, Tt the maximum temperature in week t, HDIt an index that captures weather conditions including inches of rain in week t (temperature and HDI do not refer to a specific location as stores are located in an homogeneous area), NTt the average maximum temperature recorded in past years during week t, E t;t a dummy that captures whether during week t any particular event t occurred (events include school holidays, Easter and Christmas); 9 of such dummy variables are considered in the model. Rir is a dummy variable that captures the nature of the promotion (e.g., the mean used to advertise the presence of a promotion). There are 4 different types of promotion; Rir is set to 1 if promotion r is active in week t. In our analyses we investigated the aggregation over locations as the number of SKUs in our sample was rather limited and both the products and the demand patterns were rather different. Thus out of the three dimensions over which the demand can be aggregated in this paper we will only discuss the aggregation over locations. The basic structure of the forecasting model adopted suggests that the buying behaviour of customers is influenced by the price of the goods being purchased, by weather (temperature during week t and t1), events (e.g., Christmas, Easter, summer holidays, etc.) and seasonality, measured by the average temperature in the period. Clearly the model can be applied at different levels of aggregation. First of all aggregation can be made directly on data. Indeed, to estimate the parameters of the equation we can both use detailed data for each single store or aggregate data at chain level. E.g. one can analyse the relationship between the penetration rate PQ at a
483
given store with the price at that store (detailed data), or can try to analyse the relationship between the average penetration for the whole chain of 38 stores with the average price in the chain of 38 stores. In addition, the parameters of the models can be evaluated at different aggregation level. Indeed, parameters can both refer to single stores or to the whole set of stores. In particular, if parameters are evaluated at store level we are assuming that each store has its own characteristics and in particular that customers buying at a particular store react differently to changes in causal variables. On the contrary, if parameters are estimated at chain level, we are assuming that roughly all stores react the same way. Obviously, in case aggregate data are used, only aggregate parameters can be derived, while with disaggregate data both aggregate and disaggregate parameters can be estimated. The combination of the above degrees of aggregation led us to the definition of 3 basic models, plus a new solution developed to address this problem.
2.1. Disaggregate model A first option is to apply the above model at a very detailed level where the parameters are estimated separately for each single SKU-store combination. In other words the detailed model assumes that different stores have a different sensitivity to price changes, promotions, weather, etc. Obviously, the data used to fit the model are at store level in order to capture differences among stores. Clearly this model generates forecasts at store level, while the chain level forecast is obtained by adding disaggregated forecasts. This option has obviously the advantage of capturing the differences among stores (see Appendix A for a detailed description of the model). On the contrary, this option has the disadvantage of a rather small and noisy sample, as the number of parameters to be estimated is rather large (31 parameters for 38 stores, however, in all regressions a forward procedure was adopted so that only significant parameters were actually introduced in the model).
ARTICLE IN PRESS 484
G. Zotteri et al. / Int. J. Production Economics 93–94 (2005) 479–491
2.2. Mixed model A second option is to assume that all stores share the same sensitivity to causal factors and thus the parameters of the regression equation are common to all stores. In this model, we introduce a storedummy variable that allows to capture the differences in average penetration of the SKU in various stores (see Appendix A for a detailed description of the model). Thus this model assumes that stores differ in terms of average penetration rate but react similarly to changes in price, weather and other causal factors; this system too uses data at store level to estimate parameters. The model captures only a portion of the differences among stores (namely average penetration) but has the advantage of enjoying more reliable estimates as one large sample is used instead of 38 smaller samples. In this case too forecasts are generated at store level and the chain prediction is the simple sum of demand of store-level forecasts. 2.3. Aggregate model A third option is to aggregate data and then estimate the effect of causal variables at chain level. The assumption behind this model is that stores react similarly to causal factors such as Christmas or Easter. This model benefits from more stable inputs, as data at chain level are far less noisy than data at store level. The model clearly generates only a forecast at chain-level; thus a process to derive store-level forecasts was to be designed. We adopted a rather trivial rule: the store-level forecast is equal to the chain-level forecast times a factor that captures the percentage of total demand that occurs in a given store (see Appendix A for more details). 2.4. Cluster model As discussed in the introduction, in addition to classical solutions we introduce a model based on clustering. Usually aggregation occurs over the supply chain structure and thus stores served by a given distribution centre are grouped together. We argue that clustering stores according to their demand pattern can lead to better grouping of
stores for forecasting and inventory management purposes. Consider for example two different stores, one located in a big city and another located in a small town frequently visited by tourists during summer vacations. They probably show very different demand patterns since seasonality is structurally different in the two situations, so it appears intuitive that forecasts for these stores should be estimated separately. As already discussed one extreme solution is to analyse stores one by one, though this solution leads to significant errors of estimate of parameters that might lead to inaccurate forecasts. However, if we cluster stores that show a similar demand pattern (e.g., seasonality or reaction to promotions) we can capture differences while enjoying a relatively large sample. This was achieved through the variable PQ%ijt that captures how the penetration rate (i.e., average number of unit sold per customer that entered the store PQ%ijt ) fluctuates over time. This variable is defined as follows: PQij PQ%ijt ¼ P t ij : t PQt This metric captures the fluctuation in penetration rate over time at a given store and takes the average penetration out of the picture. Thus, this variable captures whether causal factors impact on penetration rate of a given SKU similarly at different stores. For each SKU, 3 clusters were defined through the k-means algorithm. In future studies we plan to investigate the role of the number of clusters on the effectiveness of clustering. In this specific case 3 clusters were chosen to obtain clusters that on the one hand are large enough not to resemble single stores and small enough not to resemble the whole chain. The aggregate approach was used to predict demand for each cluster. In this case the chain level forecast is simply equal to the sum of forecasts for the 3 clusters. The store level forecast is disaggregated from cluster forecast like in the case of aggregate model (see Appendix A). From now on, we will consider and compare the four levels of aggregation that we introduced and which are summarised in Table 1.
ARTICLE IN PRESS G. Zotteri et al. / Int. J. Production Economics 93–94 (2005) 479–491
485
Table 1 Description of the different approaches considered
Parameters Data
Aggregate
Mixed
Disaggregate
Cluster
At chain level At chain level
At chain level At store level
At store level At store level
At cluster level At cluster level
According to the objectives of the paper, we will evaluate the forecast accuracy of the models discussed in the paper both at single store level and for the chain of 38 stores at week-item-level. As to the performance metric, we adopted mean absolute percentage error (MAPE) for the chain level forecast. This metric was chosen because of its wide adoption and because it enables to compare performance of a given model across products with significantly different demand rates. For each product, MAPE is defined as follows: 1 X 1 X AEi;t MAPE ¼ ; m i n t Di;t where AEi,t represents the absolute error for week t and store i, and Di,t is the total demand for week t and store i. However, this metric was not directly applicable at store level as for some stores slow moving items have some weeks with zero demand. This can lead to relevant bias as the statistic cannot be calculated in all periods when demand is zero and thus the statistic takes all periods when demand is extremely low out of the sample. Thus a modified MAPE, indicated as MAPE*, was introduced to avoid the problem of zero demand periods. This metric simply evaluates the MAD and compares it to total demand, so letting comparison among different forecasts possible and reducing the bias the MAPE shows in this particular situation. So we evaluated MAPE* as follows: PP AEi;t MAPE ¼ Pi Pt : i t Di;t Literature does not consider this as a relevant metric for evaluating performances; we refer to Armstrong (2001) and Vollmann et al. (1992) for a general review of forecast error metrics. However, we argue that in some cases it can be a significant measure, in particular when demand series shows
Table 2 MAPE for the different approaches at chain level Item
Mixed (%)
Aggregate (%)
Disaggregate (%)
Cluster (%)
A B C D E
25.97 15.82 16.47 21.64 21.05
8.14 7.55 11.54 11.13 13.88
21.08 15.73 8.90 13.66 21.71
11.42 3.15 8.06 10.41 15.16
Average
20.19
10.45
16.22
9.64
Table 3 Differences of performances among the different approaches at chain level
Mixed Aggregate Disaggregate Cluster
Mixed
Aggregate
Disaggregate
Cluster
– – – –
9.74% – – –
3.97% 5.77% – –
10.55% 0.81% 6.58% –
some days with no demand, since it reduces estimation biases.
3. Analysis of results As previously anticipated we have compared the four solutions both at chain and store level. In particular MAPE was adopted at chain level while a modified MAPE (MAPE*) was considered for store level. Table 2 summarises results for chain level, Table 3 shows the differences in performances at chain level for the considered approaches, while Table 4 shows the 2-tailed significance values of the Paired-Samples t test for the difference of means.
ARTICLE IN PRESS
Table 4 Significance levels for the Paired-Sample t tests on mean difference for the considered approaches
Mixed Aggregate Disaggregate Cluster
Mixed
Aggregate
Disaggregate
Cluster
– – – –
0.011671 – – –
0.094598 0.096946 – –
0.00236 0.603186 0.035935 –
MAPE(aggr)-MAPE(cluster)
G. Zotteri et al. / Int. J. Production Economics 93–94 (2005) 479–491
486
5 4 3 2 1 0 -1 -2 -3 -4
C B D 0
1
A
2
3
4
5
6
7
8
E
Ln(avg.weekly demand per store)
First of all we can observe that, as it might have been expected, the Mixed model tends to be outperformed. This result in not really surprising as oddly the mixed model fails to capture differences among stores but on the other hand uses rather noisy store-level data. Interestingly, the Aggregate model performs better than the Disaggregate one (almost 6 percentage points less, thus to more than 30% better). One can argue that this is because the accuracy is measured at chain level. However, this hypothesis can be rejected in the light of results at store level. Indeed, at store level the disaggregate model does not perform better than the aggregate one. A second hypothesis is that the aggregate model performs better because, on the one hand, the stores in the sample are relatively similar as they belong to the same retail chain where prices are relatively constant across stores, displays are similar, etc.; thus, in this context the advantage of capturing store specific causal factor does not pay off the disadvantage due to the reduction in the information available to estimate the parameters. In other words, given the relatively high homogeneity of stores and the relative lack of information at store level, the advantage of aggregation in terms of greater accuracy of parameter estimates is greater than the disadvantage due to the inability to capture differences among stores. Finally, on average the Cluster solution performs slightly better than the aggregate approach (on the average, it improves the Aggregate model performances by almost 10%); however, a PairedSample test showed that the difference is not statistically significant (see Table 4 for results). It is however interesting to investigate on which products the aggregate model performs better than the cluster one. As Fig. 1 shows the cluster
Fig. 1. Correlation between cluster approach improvements and average weekly demand per store for the 5 products considered.
approach outperforms the aggregate model for those items that have a rather high demand per store (average weekly demand per store of items B, C and D is respectively equal to 927, 90 and 126 units). On the contrary for slow moving items (average weekly demand per store of items A and E is 3 and 29 units) the aggregate model outperforms the cluster one. This result sheds lights on the conditions under which clustering might work (though this result deserves further investigation with additional SKUs). First, clustering can only work if there is enough past data to group together stores with similar patterns. In case of very slow moving items (like items A and E) it is really hard to capture similarities among stores, as demand at store level is extremely nervous as the decision of a single customer makes a difference. Thus the clustering technique can effectively group similar stores together only for rather fast moving items. Second, the clustering technique involves some kind of disaggregation as compared to the aggregate model. Though the cluster solution can capture some kind of difference among stores, the ability to accurately estimate the impact of causal factors is reduced by the lower number of stores per cluster. In other words, the amount of information available is not large enough to estimate accurately the slight differences among the stores that belong to the different clusters. As previously anticipated, to properly compare the different approaches we had to evaluate performance also at store level. This is also because for the considered firm it is relevant, since
ARTICLE IN PRESS G. Zotteri et al. / Int. J. Production Economics 93–94 (2005) 479–491
this influences the replenishment process of the items, and transportation to stores. This second analysis was conducted by means of MAPE*. Table 5 summarises results at store level for the different approaches adopted, Table 6 shows the difference in performances of the different approaches compared to the others, while Table 7 shows the 2-tailed significance values of the Paired-Samples t test for the difference of means. Again, on average, the cluster approach tends to perform better than the other techniques considered, however, such a difference with the aggregate and the disaggregate model is not statistically significant. Table 6, in fact, shows that the PairedSample test does not reject the hypothesis that the performances are similar, so we cannot conclude that globally one method performs better than another. However, it is interesting to notice that the detailed approach (i.e., the one that is consistent with the level at which performance are measured in this case) does not outperform other methods. Thus while the cluster and aggregate solution outperform the disaggregate one in terms of chain level accuracy, the disaggregate model does not outperform the aggregate and the mixed ones. In this respect we would argue that under the specific condition of this company the aggregate and cluster models perform better. In the conclusions we will discuss the generalization of this result. In addition, we considered again the correlation between performance and volumes of the different SKUs; this analysis, again, reported similar results as previously: the cluster approach performs better for relevant SKUs while it does not for slow moving items. Partially, we also argue that the presented results Table 5 MAPE* for the different approaches at store level Item
Mixed
Aggregate (%)
Disaggregate (%)
Cluster (%)
A B C D E
37.40 17.72 24.94 26.31 27.71
32.60 16.50 26.55 23.90 35.60
36.60 20.26 19.07 22.48 43.95
35.06 16.07 20.94 22.50 36.82
Average
26.82
27.03
28.47
26.28
487
Table 6 Differences of performances among the different approaches at store level
Mixed Aggregate Disaggregate Cluster
Mixed
Aggregate
Disaggregate
Cluster
– – – –
0.21% – – –
1.66% 1.44% – –
0.54% 0.75% 2.19% –
Table 7 Significance levels for the Paired-Sample t tests on mean difference for the considered approaches
Mixed Aggregate Disaggregate Cluster
Mixed
Aggregate
Disaggregate
Cluster
– – – –
0.926486 – – –
0.693914 0.623462 – –
0.837079 0.615841 0.238309 –
should be evaluated for a larger number of SKUs, to validate these primary results and that different sets of data should be considered.
4. Conclusions This paper contributes to research on forecasting in rather different ways. First of all this work contributes in drawing the attention of both researchers and practitioners on the choice of the appropriate level of aggregation of the forecasting process. Indeed, our results show that performance change significantly as a function of the aggregation level adopted. We argue that this issue is in general not much analysed, in particular by practitioners, while we believe it deserves further attention. Second we have provided evidence, even if in a specific context, that there is no ‘‘one best way’’ in defining the proper aggregation level. Even if the solutions analysed perform differently, we argue that this is not specific to the particular situation considered. Some authors in the past literature have provided contributions in this direction (see Introduction for a review of these papers); in this work we have identified some contingent factors
ARTICLE IN PRESS G. Zotteri et al. / Int. J. Production Economics 93–94 (2005) 479–491
488
that should be taken into account. In particular, the amount of information available and heterogeneity of market seem to play a crucial role. As described in Fig. 2, given a certain level of aggregation of the output of the forecasting process, a first option (the so-called ‘‘base case’’ as most companies follow this process) is to choose a consistent level of aggregation of data and analysis. E.g., if one needs to forecast demand at the region level it might seem ‘‘natural’’ to store data at the region level and analyse them at the region level as well. However, a company might choose to store data and perform analyses at either a more aggregate or more detailed level. The (viable) combinations of these choices are the first three solutions analysed in this work and appear in the previous figure. However, typically aggregation and disaggregation are developed according to dimensions that are not coherent with the demand generation process (i.e., distribution centres). Therefore, we propose an alternative method, based on a clustering methodology that tries to shift this trade-off, by introducing new aggregation dimensions. Consider for example stores in a retail chain that promote a given item (as the situation considered in this work); different stores may react differently to similar promotions thus making the estimation of regression coefficients associated to promotions rather tricky (and imprecise), if applied at aggregate level. On the other hand, applying estimation at single store level may be inefficient too, as little information might be available for a single store, thus making estimation rather unreliable. Thus, the more heterogeneous demand is, the more the application of aggregate forecast is inapplicable. Process Aggregation Aggregate
Mixed
Base Case Solution
Base Case
Disaggregate
Aggregation Level Aggregate
Disaggregate
Disaggregate
Not applicable
Base Case Aggregate
Data Aggregation
Fig. 2. The aggregation/disaggregation approaches considered.
Ability to forecast Ability to capture variability
Ability to manage variability, i.e., estimate parameters Aggregate (all customers together)
Disaggregate (all customers separately)
Forecasting level
Fig. 3. Relationship between forecasting ability and forecasting level.
However, if relatively little information is available the disaggregate solution is not applicable as well. As a matter of fact aggregate solution is not capable of capturing the heterogeneity among the reactivity of stores to promotions while the disaggregate solution, is not capable of providing reliable estimates (Fig. 3 exemplifies this trade-off). In this work, we adopted a particular solution based on cluster analysis to improve forecast performance to properly solve the trade-off in Fig. 3. The clustering approach basically groups data according to their degree of similarity rather than on the basis of other features such as region, size of the location, etc. A similar approach has also been tested in other works within different research problems (e.g., Fisher and Rajaram (2000) adopted a cluster-based methodology for accurate testing of fashion merchandise). Our solution provides good results since it is more consistent with a relevant source of variability that is stores’ heterogeneity. Indeed, our solution groups stores that are similar from a demand standpoint rather than from a geographic/ logistic standpoint. Even if the solution proposed has been developed in a specific context, we argue that these results can be somehow generalised or at the very least are worth further empirical and theoretical investigation. As a matter of fact this work claims that, when the level at which demand has to be forecasted is defined, the managers still have several options opened. First, they need to define the aggregation level of data and analysis. Second,
ARTICLE IN PRESS G. Zotteri et al. / Int. J. Production Economics 93–94 (2005) 479–491
they need to decide whether they want to use traditional aggregation processes (e.g., aggregate stores according to the region they belong to) rather than clustering. Moreover, what is relevant is not the solution itself since the dimensions considered are somehow specific to the particular context analysed, but the methodology adopted that, we argue, might be generalised. Finally, we want to draw the readers’ attention to the drawbacks of our research and thus potential room for improvement and further investigation in this area. First, in our paper we considered a specific case. Hence, it would be interesting to compare our results with empirical evidence from other companies both in the retail industry and in other industries. Clearly data and relevant variables might change, but it would still be very interesting to test how the approach performs. Second, one can aggregate data over several dimensions (time, products, locations). In our paper we focused on the latter, while it would be very interesting to test the ideas suggested over other dimensions. E.g., it would be rather interesting to know if it is appropriate to derive weekly forecasts from daily ones or vice versa, to test clustering technique to group stores that show similar seasonality factors, etcy. On the products dimension, for example, one might argue that forecasting the demand rate of a given pair of men’s shoes in a given colour, in a given size at a given store, by looking at past sales at this level of detail is just hopeless. On the contrary, we argue, one should, for example, rather look at the aggregate sales of the single shoe and use the size distribution of men’s shoes to breakdown the aggregate forecast at the size level. Third, we investigate a specific, though widely adopted, forecasting technique and one might wonder whether the findings hold in case of other approaches (e.g., exponential smoothing) that do not require estimates of parameters. Finally, this paper only provides empirical evidence and comments on the results. However, a theoretical model that introduces the relevant variables, identifies the relevant trade-offs and provides theoretical background for the empirical
489
investigation would be required to fully develop this rather new stream of research.
Acknowledgements Authors have contributed jointly to the present work, nevertheless Federico Caniato has written Section 3, Matteo Kalchschmidt has edited Section 1, while Giulio Zotteri has edited Sections 2 and 4.
Appendix A In this appendix we use the following codes: i is the store index, j the SKU index and t the week index. The number of customers entering the store i in a given week t N it ; the demand per week per store per item Dijt are the observable variables. Given these two variables we can define the following variables: PQijt ¼ Dijt =N it is the penetration rate for product j in store P i in P week t; PQjt ¼ i Dijt = i N it is the average penetration rate for the chain of stores. Aggregate forecast is derived by evaluating predictions on N it and on PQit indicated, respeci i tively as N t and PQt . A.1. Disaggregate model lnðPQijt Þ ¼ aij0 þ aij1 lnðpijt Þ þ aij2 Ajt þ aij3 T t þ aij4 HDIt þ aij5 T t1 þ aij6 HDIt1 þ aij7 NTt þ aij8 NTt1 þ aij9 NTt2 X X þ bijt E t;t þ gijt E t1 ;t t
þ
X
t
dijr Rjt; r ;
r
pijt
is the average price of item j at store i in where week t, Ajt a dummy variable that captures the presence of a promotion for item j in week t, Tt the maximum temperature in week t; HDIt an index that captures weather conditions including inches on rain in week t (temperature and HDI do not refer to a specific location i as stores are located in
ARTICLE IN PRESS G. Zotteri et al. / Int. J. Production Economics 93–94 (2005) 479–491
490
an homogeneous area); NTt the average maximum temperature recorded in past years during week t, E t; t a dummy variable that captures whether during week t any event has occurred (events include school holidays, Easter and Christmas), Rjtr a dummy variable that captures the nature of the promotion (e.g., the mean used to advertise the presence of a promotion). There are 4 different types of promotion. Rjtr is set to 1 if promotion r is active in week t for item j.
by the number of units sold in the chain in the week. This leads to the generation of a chain level demand forecast. The chain level demand forecast is broken down at single store level according to the following procedure. For each store, forecast is disaggregated as follows: ij
j
Dt ¼ Dt ½Ajt SF1ij þ ð1 Ajt ÞSF0ij ; where ij
SF0 ¼
P
t ð1
A.2. Mixed model lnðPQijt Þ ¼
X
aj0 þ aj1 lnðpijt Þ þ aj2 Ajt
ij
SF1 ¼
k
þ aj3 T t þ aj4 HDIt þ aj5 T t1 þ þ þ
aj6 HDIt1 þ aj7 NTt þ aj8 NTt1 X aj9 NTt2 þ bjt E t; t t X X j gt E t1; t þ djr Rjt; r : t r
For both the Disaggregate and Mixed model, we can evaluate total chain forecast by simply adding up all demand forecasts for each store as follows: ij
ij
ij
Dt ¼ N t PQt : So for each item, total chain forecast is evaluated as: X ij j Dt ¼ Dt :
P Ajt ÞðDijt = t Dijt Þ ; P j t ð1 At Þ
j ij P Dijt Þ t At ðDt = : P j t t At
P
This disaggregation procedure is aimed at capturing the differences in sales rates among the stores. Indeed the SF0ij and SF1ij factors basically compare the sales rate of the single store to the average sales rate of the chain. In addition, promotions are not as effective at all the stores in the chain. Thus, the ratio between the sales rate of the single store and the sales rate of the whole chain changes according to whether the item is being promoted or not. In addition, regression was tested to breakdown the chain-level forecast at the store level with very similar results in terms of accuracy, but with no guarantee of consistency among the different levels of aggregation.
i
References
A.3. Aggregate model lnðPQjt Þ ¼ aj0 þ aj1 lnðpjt Þ þ aj2 Ajt þ aj3 T t þ aj4 HDIt þ aj5 T t1 þ aj6 HDIt1 þ aj7 NTt þ aj8 NTt1 þ aj9 NTt2 X X þ bjt E t; t þ gjt E t1; t t
þ
X
t
djr Rjtr :
r
In the above formula, the variable pjt derives from aggregation of more detailed variables introduced in the detailed model. pjt is the average price in the chain, i.e., the total turnover divided
Armstrong, J.S., 2001. Principles of Forecasting: A Handbook for Researchers and Practitioners. Kluwer Academic Publishers, Dordrecht. Barnea, A., Lakonishok, J., 1980. An analysis of the usefulness of disaggregated accounting data for forecasts of corporate performance. Decision Sciences 11, 17–26. Caniato, F., Kalchschmidt, M., Ronchi, S., Verganti, R., Zotteri, G., 2002. Forecasting demand fluctuations due to promotional activities: a case in the fresh food industry. Proceedings of the POMS Annual Conference, San Francisco. Fisher, M., Rajaram, K., 2000. Accurate retail testing of fashion merchandise, methodology and application, Marketing Science 19 (3), 266–278. Fliedner, G., 1999. An investigation of aggregate variable time series forecast strategies with specific subaggregate time
ARTICLE IN PRESS G. Zotteri et al. / Int. J. Production Economics 93–94 (2005) 479–491 series statistical correlation. Computers & Operations Research 26, 1133–1149. Gross, C.W., Jeffrey, E.S., 1990. Disaggregation methods to expedite product line forecasting. Journal of Forecasting 9, 253–254. Grunfeld, Y., Griliches, Z., 1960. Is aggregation necessarily bad? Review of Economics and Statistics 42, 1–13. Ilmakunnas, P., 1990. Aggregation vs. disaggregation in forecasting construction activity. In: Barker, T.S., Pesaran, H. (Eds.), Disaggregation in Econometric Modelling. Routledge, London, pp. 73–86. Kahn, K.B., 1998. Revisiting top-down versus bottom-up forecasting. Journal of Business Forecasting (Summer), 14–19. Kotler, P., 1984. Marketing Management: Analysis, Planning and Control. Prentice-Hall, Englewood Cliffs, NJ. Lapide, L., 1998. New developments in business forecasting. Journal of Business Forecasting (Summer), 28–29. Mentzer, J.T., Cox Jr, J.E., 1984a. Familiarity, application and performance of sales forecasting techniques. Journal of Forecasting 3, 27–36. Mentzer, J.T., Cox Jr, J.E., 1984b. A model of the determinants of achieved forecast accuracy. Journal of Business Logistics 5, 143–155. Miller, J.G., Berry, W., Lai, C.-Y.F., 1976. A comparison of alternative forecasting strategies for multi-stage production inventory systems. Decision Sciences 7, 714–724.
491
Muir, J.W., 1979. The pyramid principle. Proceedings of 22nd Annual Conference American Production and Inventory Control Society, pp. 105–107. O¨ller, L.E., 1989. Aggregating problems when forecasting industrial production using business survey data, Discussion Paper, No. 20, Ministry of Finance, Economics Department, Helsinky. Orcutt, G., Watts, H.W., Edwards, J.B., 1968. Data aggregation and information loss. American Economic Review 58, 773–787. Schwarzkoph, A.B., Tersine, R.J., Morris, J.S., 1988. Topdown versus bottom-up forecasting strategies. International Journal of Production Research 26 (11), 1833–1843. Small, R.L., 1980. Sales forecasting in Canada: A survey of practices, The Conference Board of Canada, Study no. 66. Theil, H., 1954. Linear Aggregation of Economic Relations. North-Holland Publishing Company, Amsterdam. Vollmann, T.E., Berry, W.L., Whybark, D.C., 1992. Manufacturing Planning and Control Systems. Irwin, Homewood, IL. Weatherford, L.R., Kimes, S.E., Scott, D.A., 2001. Forecasting for hotel revenue management: testing aggregation against disaggregation. Cornell Hotel and Restaurant Administration Quarterly (August), 53–64. Zellner, A., Tobias, J., 2000. A note on aggregation, disaggregation and forecasting performance. Journal of Forecasting 19, 457–469.