Forecasting Supply Chain Demand by Clustering Customers

Forecasting Supply Chain Demand by Clustering Customers

Proceedigs the 15th IFAC Symposium on Information of Control Problems in Manufacturing Information Control Problems in Manufacturing May 11-13, 2015. ...

625KB Sizes 14 Downloads 152 Views

Proceedigs the 15th IFAC Symposium on Information of Control Problems in Manufacturing Information Control Problems in Manufacturing May 11-13, 2015. Ottawa, Canada Proceedigs of the 15th IFAC Symposium Proceedigs of the 15th IFAC Symposium on on Available online at www.sciencedirect.com May 11-13, 2015. Ottawa, Canada Information Control Problems in Manufacturing Manufacturing Information Control Problems in May 11-13, 11-13, 2015. 2015. Ottawa, Ottawa, Canada Canada May

ScienceDirect

IFAC-PapersOnLine (2015) 1834–1839 Forecasting Supply Chain48-3 Demand by Clustering Customers Forecasting Supply Chain Demand by Clustering Customers Forecasting Supply Chain Demand by Clustering Customers

Paul W. Murray* Bruno Agard** Paul W. Murray* Bruno Agard** Marco A. Barajas*** Paul Murray* Bruno Marco A. Barajas*** Paul W. W. Murray* Bruno Agard** Agard**  Barajas*** Marco A. Marco A.  Barajas*** *École Polytechnique de Montréal, Montréal, QC H3C 3A7 Canada  *École (Tel: Polytechnique de Montréal, QC H3C 3A7 Canada 310-634-8795; e-mail: Montréal, [email protected]) *École Polytechnique de 310-634-8795; e-mail: Montréal, [email protected]) *École Polytechnique de Montréal, Montréal, QC H3C 3A7 Canada ** École(Tel: Polytechnique deMontréal, Montréal, Montréal,QC QCH3C H3C3A7 3A7Canada Canada 310-634-8795; e-mail: [email protected]) ** École(Tel: Polytechnique de Montréal, Montréal, QC H3C 3A7 Canada (Tel: 310-634-8795; e-mail: [email protected]) (e-mail: [email protected]) ** Polytechnique de Montréal, Montréal, H3C (e-mail: [email protected]) ** École École Polytechnique deMemorial Montréal,University Montréal,ofQC QC H3C 3A7 3A7 Canada Canada *** Fisheries & Marine Institute of Newfoundland, St. John’s, NL (e-mail: [email protected]) *** Fisheries & Marine Institute of Memorial University of Newfoundland, St. John’s, NL (e-mail:[email protected]) [email protected]) (e-mail: *** Fisheries Fisheries & & Marine Marine Institute Institute [email protected]) Memorial University University of of Newfoundland, Newfoundland, St. St. John’s, John’s, NL NL (e-mail:of *** Memorial (e-mail: [email protected]) [email protected]) (e-mail: Abstract: Demand forecasts are essential for managing supply chain activities but are difficult to create Abstract: Demand information forecasts areisessential for managing supply activities but aretools difficult to create when collaborative absent. Many traditional and chain advanced forecasting are available, Abstract: Demand forecasts are essential for managing supply chain activities but are difficult tomining create when collaborative information is absent. Many traditional and advanced forecasting tools are available, Abstract: Demand forecasts are essential for managing supply chain activities but are difficult create but applying them to a large number of customers is not manageable. In our research, we use datato when collaborative information is absent. Many traditional and advanced advanced forecasting tools are available, but applying toinformation a segments large number of customers issimilar not manageable. In ourforecasting research, we useare data mining when collaborative is Many traditional and tools available, techniques tothem identify of absent. customers with demand behaviors. Historical usage is used to but applying them to aa segments large number number of customers customers issimilar not segments manageable. In our research, research, we usage use data data mining techniques tothem identify of customers demandare behaviors. Historical is used to but applying to large of is not manageable. In our we use mining cluster customers with similar demands. Once with customer identified, a manageable number of techniques to identify customers similar demand behaviors. Historical usage is to cluster customers with similar Once customer segments are identified, a manageable techniques to identify segments ofrepresent customers with similar demand behaviors. Historical usage number is used used of to forecasting models cansegments be builtdemands. toof thewith customers within the segments. cluster customers with similar demands. Once customer segments are identified, a manageable number of forecasting modelswith can similar be builtdemands. to represent thecustomer customers within the cluster customers Once segments aresegments. identified, a manageable number of Keywords: Data Models, Exogenous variables, Forecasting, Segmentation, Vendor © 2015, IFAC (International Federation of Automatic Control) Hosting Elsevier Ltd.Managed All rightsInventory. reserved. forecasting models can built to the customers within the segments. forecasting models can be beExogenous built to represent represent the Forecasting, customers within the by segments. Keywords: Data Models, variables, Segmentation, Vendor Managed Inventory.  Keywords: Data Data Models, Models, Exogenous Exogenous variables, variables, Forecasting, Forecasting, Segmentation, Vendor Vendor Managed Managed Inventory. Inventory. Keywords: Segmentation,  1. INTRODUCTION exhibiting a variety of usage patterns, particularly when the  1. INTRODUCTION exhibiting a variety of usage(Aburto patterns,and particularly when the usage patterns are non-linear Weber, 2007). Supply chains have existed since the onset of usage 1. INTRODUCTION INTRODUCTION exhibiting a variety of usage patterns, particularly patterns are non-linear (Aburto and Weber, 2007). 1. exhibiting a variety of usage patterns, particularly when when the the Supply chains but have existed since chains the isonset of This industrialization, the study of supply relatively proposes a methodology based on usagepaper patterns are non-linear non-linear (Aburtofor andforecasting Weber, 2007). 2007). usage patterns are (Aburto and Weber, Supply chains have existed since the onset of industrialization, the study supply chains relatively paper proposesofa segments methodology for forecasting on Supply chains but have existed since the isofonset of This new (Janvier-James, 2012). Theofoverall objective a supply demand behaviors of customers. Databased mining industrialization, but the supply chains is paper proposes methodology for on new (Janvier-James, Theof objective a supply demand behaviors ofaa segments of customers. Databased mining industrialization, but 2012). the study study ofoverall supply chains isofrelatively relatively This paper proposes methodology for forecasting forecasting based on chain is to transform and transport materials or products, add This techniques are utilized to interpret historical usage and the new (Janvier-James, 2012). The overall objective of a supply demand behaviors of segments of customers. Data mining chain is to transform and transport materials or products, add techniques are utilized to interpret historical usage and the new (Janvier-James, 2012). The of overall of aprocess supply demand of segments of customers. mining value, and satisfy the demand each objective step of the effects ofbehaviors external factors to develop a predictionData method that chain is isand to transform transform and transport materials orofproducts, products, add are utilized to usage and the value, satisfy theand demand of materials each stepor the process effects of external factors develop historical a prediction method chain to transport add techniques are utilized totointerpret interpret historical usage and that the (Janvier-James, 2012). Understanding downstream demand is techniques can accurate forecast demand. value, and satisfy the demand of each step of the process effects of external factors to develop a prediction method that (Janvier-James, 2012). Understanding downstream demand is can accurate forecast demand. value, and satisfy the demand of each step of the process effects of external factors to develop a prediction method that prerequisite to accomplishing the overall supply chain The accurate remainder of thedemand. paper is organized as follows: the next (Janvier-James, 2012). Understanding downstream demand is can can accurate forecast demand. prerequisite to 2012). accomplishing the downstream overall supply chain (Janvier-James, Understanding demand is forecast objective (Carbonneau et al., 2008). The traditional supply The remainder of the paper is organized as follows: next section is a review of the literature for the topics ofthe CFPR, prerequisite to where accomplishing the overall supply chain objective (Carbonneau et al., 2008). The traditional supply prerequisite to accomplishing the overall supply chain chain behaviors customers placed orders and generally The remainder of the paper is organized as follows: the section is a review of the literature for the topics of CFPR, The the paper is organized as follows:methods. the next next VMI,remainder customerof segmentation, and forecasting objective (Carbonneau et traditional supply chain behaviors where requirements customers placed orders andprevalent; generally objective (Carbonneau et al., al., 2008). 2008). The traditional supply section advised their demand areThe no longer is a review of the literature for the topics of CFPR, VMI, customer segmentation, and forecasting methods. section is a review of the literature for the topics of CFPR, Following the state of the art are sections on methodology, chain behaviors where customers placed orders and generally advised theirresponsibility demand are no longer chain behaviors where requirements customers placed orders andprevalent; generally nowadays, for inventory management and VMI, customer segmentation, and forecasting methods. Following theresults, state ofand the art areand sections methodology, VMI, customer segmentation, forecasting methods. study discussion. Weonclose with a advised their demand are longer nowadays, for inventory and case advised theirresponsibility demand requirements are no no management longer prevalent; forecasting has oftenrequirements shifted upstream to theprevalent; vendor. Following the state of the art are sections on methodology, case study results, and discussion. We close with a Following the state of the art are sections on methodology, conclusion and present suggestions for future research. nowadays, responsibility for inventory management and forecasting has often shifted upstream to the vendor. nowadays, responsibility for inventory management and conclusion Collaborative Forecasting Planning & Replenishment (CFPR) case study studyand results, and discussion. We research. close with with aa present suggestions for future case results, and discussion. We close forecasting often shifted the vendor. Collaborative Forecasting Planningupstream & Replenishment (CFPR) forecasting has often Inventory shifted upstream to two the strategies vendor. conclusion and present and Vendor has Managed (VMI) areto 2. LITERATURE suggestions REVIEW for future future research. research. conclusion and present suggestions for Collaborative Forecasting Planning & Replenishment Replenishment (CFPR) and Vendor Managed Inventory are to twothestrategies 2. LITERATURE REVIEW Collaborative Forecasting Planning & (CFPR) where forecasting responsibility is (VMI) transferred supplier 2.1 Collaborative Planning & Replenishment and Vendor Managed Inventory (VMI) are two strategies 2.Forecasting, LITERATURE REVIEW where forecasting responsibility is transferred to the supplier and Vendor Managed Inventory (VMI) are two strategies 2. LITERATURE REVIEW (Holweg et al., 2005). 2.1 Collaborative Forecasting, Planning & Replenishment where forecasting forecasting responsibility is is transferred transferred to to the the supplier supplier Communication between supply chain members (Holweg et al., 2005). where responsibility is beneficial 2.1 Collaborative Forecasting, Planning & 2.1 Collaborative Forecasting, Planning & Replenishment Replenishment Forecasting is sometimes complicated by the fact that (Holweg et al., 2005). Communication between supply chain members is beneficial (Holweg et al., 2005). for building better forecasts and improving competitiveness Forecasting is sometimes complicated by the fact that downstream supply chain members cannot or will not share for Communication between supply chain members is successful beneficial building better forecasts and improving competitiveness Communication between supply chain members is beneficial (Vachon et al., 2009); collaboration is the key to Forecasting is support sometimes complicated by the factet that downstream supply chain members or(Holweg will not share Forecasting is sometimes complicated by the fact that information to building the cannot forecast al., for building better forecasts and improving competitiveness (Vachon et al., 2009); collaboration is the key to successful for building better forecasts and improving competitiveness CFPR. Downstream information is transmitted from customer downstream supply chain members cannot orchain will not not share information to support building the cannot forecast (Holweg et al., downstream supply chain members will share 2005). Sharing information between supplyor members (Vachon et al., 2009); collaboration is the key to successful CFPR. Downstream information is transmitted from customer (Vachon et al., 2009); collaboration is the key to successful to supplier through a variety of modes including electronic information to support support building the forecast forecast (Holweg et al., al., 2005). Sharing information between supply chain members information to building the (Holweg et can be hindered by a general reluctance or inability of supply CFPR. Downstream information is transmitted from customer to supplier through a variety of modes including electronic CFPR. Downstream information is transmitted from customer data exchange, forecasts, and interaction between personnel 2005). Sharing information between supply chain members can bepartners hindered byshare a general reluctance or inability supply 2005). Sharingtoinformation between supply chain members chain forecast data (Kembro and of Näslund, to supplier through a variety of modes including electronic data exchange, forecasts, and interaction between personnel to supplier through a variety of modes including electronic of the supply chain partners. can be hindered by a general reluctance or inability of supply chain tobyshare forecast data (Kembro and of Näslund, can bepartners hindered a general reluctance or supply 2014). When collaborative information is inability not available, the data forecasts, and of theexchange, supply chain partners. exchange, forecasts, and interaction interaction between between personnel personnel chain to share forecast data and Näslund, 2014). When information is not the data chain partners partners torely shareon forecast data (Kembro (Kembro andqualitative Näslund, supplier mustcollaborative historical data andavailable, Despite the benefits of CFPR, research has shown that in of the supply chain partners. of the supply chain partners. 2014). When collaborative information is not available, the supplier must rely on historical data and qualitative the benefits of CFPR, research shown that in 2014). When collaborative information not available, the Despite knowledge of the marketplace to build theisforecast. practice collaboration frequently does not has prevail (Holweg et supplier on data and knowledge of therely marketplace to build the Despite the benefits of CFPR, research shown that in practice collaboration frequently doesCFPR not has prevail (Holweg et supplier must must rely on historical historical dataforecast. and qualitative qualitative al., Despite the One benefits of CFPR, research has shown that of in 2005). factor inhibiting is the lack Creating withoutto the the benefit of customer al., knowledgeaof of forecast the marketplace marketplace to build build the forecast. practice collaboration frequently does not prevail (Holweg et 2005). One factor inhibiting CFPR is the lack of knowledge the forecast. practice collaboration frequently does not prevail (Holweg to et technology with sufficient sophistication Creating a isforecast without the For benefit of customer information not a new activity. example, seminal information al., 2005). One factor inhibiting CFPR is the lack of information technology with sufficient sophistication to al., 2005). One factor inhibiting CFPR is the lack of enable data sharing. Data sharing is not possible if supply Creating a forecast without the benefit of customer information is not a new activity. For example, seminal Creating in a forecast the benefit of customer research forecastingwithout with time series analysis was information technology with sufficient sophistication to enable partners’ data sharing. Datawith sharing is notare possible if supply information technology sufficient sophistication to computing systems inadequate or information is not new activity. seminal research series analysis was chain information isforecasting not50aa years newwith activity. For example, seminal published innearly ago time (BoxFor andexample, Jenkins, 1968). enable data sharing. Data sharing is not possible if supply chain partners’ computing systems are inadequate or enable data sharing. Data sharing is not possible if supply (Hernández et al., 2014). When electronic data research forecasting with time analysis was published 50 years ago methods (Box series andareJenkins, 1968). research innearly forecasting with time series analysis was incompatible However, in traditional statistical less effective chain partners’ partners’ computing systems arebeinadequate inadequate or incompatible (Hernández et al.,systems 2014). When electronic data chain computing are or sharing is not possible, CFPR can still accomplished published nearly 50 years ago (Box and Jenkins, 1968). However, traditional statistical methods are less effective published nearly 50 years ago (Box and Jenkins, 1968). when there are many downstream supply chain members sharing incompatible (Hernández et al., 2014). When electronic data is not possible, CFPR can still be accomplished incompatible (Hernández et al., 2014). When electronic data However, traditional statistical methods arechain less members effective when theretraditional are manystatistical downstream supplyare However, methods less effective sharing is is not not possible, possible, CFPR CFPR can can still still be be accomplished accomplished when there there are are many many downstream downstream supply supply chain chain members members sharing when

Copyright © 2015 IFAC 1904 2405-8963 © 2015, IFAC (International Federation of Automatic Control) Copyright 2015 IFAC 1904Hosting by Elsevier Ltd. All rights reserved. Peer review© of International Federation of Automatic Copyright ©under 2015 responsibility IFAC 1904Control. Copyright 2015 IFAC 1904 10.1016/j.ifacol.2015.06.353

INCOM 2015 Paul W. Murray et al. / IFAC-PapersOnLine 48-3 (2015) 1834–1839 May 11-13, 2015. Ottawa, Canada



through manual preparation and sharing of information. The labor and efforts, however, limit the scope of this method (Holweg et al., 2005). Finally, a significant enabler of CFPR is trust and personal relationships between personnel within the partner organizations (Wang et al., 2014), the absence of trusting relationships further hinders information sharing. While CFPR is an important and beneficial component of supply chain performance, if it does not exist, other strategies must be employed to build a forecast. These methods are discussed in the following sections. 2.2 Vendor Managed Inventory (VMI) In the case of VMI, responsibility for forecasting lies entirely with the supplier. Ideally, the supplier in a VMI strategy has access to downstream information sources including consumption rates, inventory levels, and forecasts (Achabal et al., 2000). Research has shown that VMI strategies are effective at increasing supply chain efficiency, reducing overall costs in the supply chain, and increasing supply chain competitiveness (Achabal et al., 2000, Jung et al., 2005). Despite a direct link between supply chain performance and information sharing (Forslund and Jonsson, 2007), supply chain partners are sometimes unwilling or unable to share useful forecasting information (Kembro and Näslund, 2014, Holweg et al., 2005). When collaborative information is absent, the supplier must rely on alternate methods to ensure its ability to satisfy demand. 2.3 Customer Segmentation In a VMI arrangement where forecast information does not exist, the supplier must develop a forecast with the information it has available, that information is typically historical data. While statistical methods exist for developing forecasts based on historical data, it is impractical to attempt to build individual forecasts for a large number of unique customer demands. Therefore, grouping customers into logical segments is necessary so that a small number of forecast models can be utilized to represent the total customer population. There are several methods for segmenting a population of customers, and it is important to carefully select a suitable clustering algorithm to obtain accurate results (Kashwan and Velu, 2013). Rudimentary segmentation based on geographic location or customers’ industry is tempting since it is relatively easy. However, the demand characteristics within these segments may be very different and therefore rudimentary segmentation is not a suitable method (Shapiro, 2007). There are several more advanced segmentation methods which are categorized as partitioning, hierarchical, density-based, grid-based and model-based (Han and Kamber, 2006) – of these, the most common are hierarchical and partitioning (Le et al., 2009, Chakraborty, 2013). Partitional clustering is more suitable for handling large data sets due to lower computational costs. The most common partitioning method, k-means, appears frequently in literature. It is not a new technique, appearing in Fisher (1958) and further developed by MacQueen (1967). With kmeans, a set of N data point are grouped into K clusters with the mean of each cluster becoming its identifying location.

1835

Despite its frequent use, k-means has failings due to it ignores natural cluster pattern and rigidly assigns points to individual groups (MacKay, 2003). An alternative clustering strategy to k-means is Artificial Neural Networks (ANN) and the related Kohonen SelfOrganizing Maps (SOM) (Altintas and Trick, 2014). SOM was formally developed by Kohonen as un unsupervised alternative to traditional ANNs (Kohonen, 1990). Unlike kmeans where the number of clusters must be pre-defined, ANNs are unsupervised; the number of clusters is part of the ANN’s output. Several variants of both k-means and ANNs have been developed to overcome some of their failings. These include the addition of fuzzy logic to avoid the rigid assignment of points to clusters (MacKay, 2003). Fuzzy logic allows the identification of groups with similar attributes (Barajas and Agard, 2014) and when combined with partitional clustering it allows flexibility of assigning points proportionally to more than one cluster. The variants are titled with various names that describe their content, such as soft k-means, fuzzy kmeans, and fuzzy ANNs. 2.4 Forecasting Methods Forecasting tools based on elementary and advanced statistical methods have proven useful to predict future demand when the situation is correct for their application (Kuo, 2001). More advanced statistical methods, such as genetic algorithm initiated fuzzy networks (GFNN) have shown the potential to improve on the results provided by traditional forecasting methods (Kuo, 2001). These methods were popular due to easy implementation, and in cases where external factors were not influential, they provided usable information. However, research has found that simple statistical methods tend to amplify the Bullwhip effect (Carbonneau et al., 2008). More advanced time series methods such as auto-regressive integrated moving average (ARIMA), and Winter’s method were developed (Chang et al., 2011). ARIMA is widely used for forecasting, but has less value for non-linear patterns (Pai and Lin, 2005). Researchers have applied more sophisticated methods to traditional statistical tools. For example, Ferbar et al.(2009) attempted to obtain better results from exponential smoothing by incorporating wavelet denoising into it. Results attained showed improvement over exponential smoothing, but did not compare with other forecasting tools. Historical sales levels might give an indication of demand, but they do not account for exogenous variables. Data mining with decision making algorithms is one approach to understanding the historical data, incorporating the exogenous factors, and eventually producing useful results (Kusiak, 2007). Artificial neural networks (ANNs) were identified as potentially suitable for supply chain forecasting in the 1990’s (Leung, 1995) and have since been found to be very effective for developing forecasts. The ANNs architecture known as multi-layer perceptron (MLP) with back propagation is commonly used (Beccali et al., 2004). However, Chang et al. (2011) point out some shortcomings with back propagating

1905

INCOM 2015 Paul W. Murray et al. / IFAC-PapersOnLine 48-3 (2015) 1834–1839 May 11-13, 2015. Ottawa, Canada

1836

models. They suggest that a modified ANN can provide a better result. The modification proposed by Chang et al. (2011) involves incorporating genetic algorithms (GAs) into the ANN as an improvement to define the ANN prior to training. GAs, first introduced by John Holland in 1975, are sometimes used to configure the topography of ANNs (Kaylani et al., 2010). The problem of establishing the desired level of complexity in an ANN model is discussed by Efendigil et al (2009). In research conducted, they conclude that model complexity is determined by the number of hidden layers which could be heuristically determined. Furthermore, they employ fuzzy logic in an ANN to overcome the limitations of using conventional mathematical tools. The adaptive networkbased fuzzy inference system (ANFIS) was employed as well. The benefit to ANFIS is that it allows human knowledge to define the input-output mapping (Jang, 1993). The inclusion of a priori knowledge allows the user to influence the direction of the ANN model rather than letting the model operate unsupervised. Another modification to forecasting with ANNs that appears frequently in the literature involves incorporating a fuzzy component (Chang et al., 2011, Efendigil et al., 2009, Kuo, 2001, Neto et al., 2011) and others. Fuzzy logic is used to improve decision making processes by avoiding the constraint of yes/no binary decisions (Barajas and Agard, 2014). In the case of ANNs, fuzzy logic allows observations to hold proportional membership in multiple nodes. While hybrid models have received much attention in the literature, Pai and Lin (2005) used a different approach. Rather than combining ANNs and fuzzy logic, they combined support vector machine (SVM) with ARIMA. In their research, Pai and Lin found that the hybrid model performed better than either SVM or ARIMA. However, their model was not compared to other forecasting tools. SVM is a member of the machine learning family and as with ANNs, it is generally associated with classification problems. Developments in the field of SVM have led to strategies such as soft margins (Campbell and Ying, 2011). Soft margins allow flexibility in the model, similar to applying fuzzy logic, and allow the data to be fit less rigidly. 3 METHODOLOGY The literature review shows that developing forecasts without collaborative information is possible, and that hybrid methods can provide improved accuracy over traditional methods. However, it does not answer our problem that the data used for forecasting, regardless of the method used, needs to be sufficiently complete and accurate prior to building a forecast model. Here we develop a methodology that addresses data accuracy and takes advantage of the advances made in customer segmentation and forecasting. Out methodology includes three stages: data preprocessing, segmentation, and forecasting. Fig. 1 illustrates the sequence of the stages and the linkage between activities.

Fig. 1: Methodology 3.1 Preprocessing Prior to attempting to analyze the case study data, the important stage of evaluating and resolving data quality had to be addressed (Kantardzic, 2011). In our case study, the supplier was able to provide four years of delivery activity including delivery dates, quantities, locations, and industry SIC code. Although the data was extracted from actual delivery activities, some information created outliers and distorted true behavior. Administrative entries created false spikes in quantities, and new or lost customers created false periods of zero-usage. When preprocessing data, one must be careful to clean the data enough to allow discovery of useful patters of information within the data while not removing valuable information. To facilitate this, customers were evaluated on a yearly basis. For example, the yearly evaluation allowed the data from a new or lost customer to be retained for the years where it was valid. Outliers in the data were identified through several simple statistical methods. Customers whose total consumption were nearly zero or negative were removed. Customers with spikes in quantities were identified and removed by evaluating the median and standard deviations in the quantities. 3.2 Segmentation Customers were segmented using the computer program R (R Core Team, 2014) using k-means. The segmentation was based on monthly volume of product delivered and calculated with Euclidean distance measurement. Determining the optimum number of clusters to use in K-means is arbitrary (Kantardzic, 2011), however some guidance it given by applying Hartigan’s Number which provides a recommended number of clusters based on the data presented to its algorithm (Madigan, n.d.). Results were evaluated followed by an iterative preprocessing and re-clustering.

1906

INCOM 2015 Paul W. Murray et al. / IFAC-PapersOnLine 48-3 (2015) 1834–1839 May 11-13, 2015. Ottawa, Canada



3.3 Forecasting Once the customers were segmented, forecasting models were applied to assess the viability of the segments. Creation of customer segments based on consumption behavior allowed for simple forecasting methods tailored for each segment. For example, segments with seasonal effect are treated differently from those with increasing or decreasing trends. 4 CASE STUDY 4.1 Context The context of the cast study for our research is a supplier of bulk materials which are used for a variety of processes in manufacturing, food packaging, and medical services (we use “BM” and “BM supplier” in this paper to preserve the industry partner’s anonymity). Points of use inventories of BM are managed by the BM supplier (vendor managed inventory); demand is driven by usage in the customers’ various operations. Those demands, in turn, are influenced by internal and exogenous variables. The resulting individual customer demand profiles vary and exhibit trends, seasonality, and stochastic properties. To forecast inventory requirements (the dependent variable), various factors (the independent variables) such as industry type, location, seasonality of demand, and other factors are considered. BM usage in some industries (e.g. fishing) has seasonal patterns. Other industries (e.g. aerospace manufacturing) have requirements that vary according to sales levels. The BM product requires specialized transportation and storage facilities which are expensive and not quickly available. Therefore, adjusting point of use storage capacity is avoided. In our case study, the supplier is responsible for maintaining an uninterrupted supply at their customers’ point of use. Short-term demand forecasts are generated based on historical usage; current demand and shipments are triggered by telemetric data obtained from level sensors in the point of use storage tanks. A commitment to ensuring an uninterrupted inventory at customers’ sites and lack of medium and long-range forecast data forces the supplier to be conservative with future capacity decisions.

1837

should be shifted to reflect similar behaviors that are offset in time; a process known as dynamic time warping (Berndt and Clifford, 1994). Three types of errors were apparent in the data and corrected with preprocessing. Firstly, the data is a record of deliveries from the supplier, not of consumption by the customer. Point of use storage allows the supplier to deliver based on its own schedule and not necessarily on the customers’ consumption needs. Using the data in this form would impose a Bullwhip effect. To overcome the Bullwhip effect possibility, the values were smoothed with moving weighted averaging. Moving average was chosen because it is a simple method and has shown to be effective in reducing Bullwhip effect (Chen et al., 2000). Additionally, simple data processing methods are often proven to be as effective as more complex ones for data mining (Witten et al., 2011). Second, administrative adjustments (debit and credit quantities) showed deliveries that were negative or greater than the equipment capability. These were removed by comparing standard deviation against median quantities; high variance vectors were removed. Lastly, new or lost customers would exhibit false demand behavior if all four years of records were included since they have a period of zero demand. This was resolved by splitting the data into customer-year vectors. That is, every customer vector becomes four individual vectors with one year of data in each. Customer-year vectors with zero demand were then separated and the customers’ true demand behavior remained. 4.3 Segmentation The initial clustering results were not satisfactory due to outliers in the data. As illustrated in Fig. 2, the majority of the customers fell into a single segment (Cluster #8). Investigation into the members of both the very large and very small clusters revealed outliers that had escaped detection in the initial data preprocessing. Adjustments to the filtering criteria in the preprocessing eventually produced better results.

Data for delivery instances of BM is recorded, including customer, date, location, and quantity. Unfortunately, the number of records, over 200,000 per year, has resulted in a surfeit of data to interpret with traditional forecasting methods. Additionally, although there are similar strategies in place for products such as electrical distribution, there is no current method to assess and predict the impacts of external factors such as location, season, industry of user, and climate on bulk product demand. 4.2 Data Preprocessing

Fig. 2: Initial Segmentation Results

Our data consists of four years of history with nearly one million observations of delivery events, including customer, distribution center, quantity, and product type. We need to vector the data into monthly deliveries per customer per product, resulting in approximately 3500 time series vectors. Time-series data must be carefully prepared prior to clustering (Liao, 2005) with attention given to whether it

After removal of outliers, the data was normalized and then re-segmented with k-means. The new segments, as illustrated in Fig. 3 are more evenly distributed. While a dominant segment remains (Cluster #3), there are sufficiently distinguishable segments for forecast testing and evaluation within those segments.

1907

INCOM 2015 Paul W. Murray et al. / IFAC-PapersOnLine 48-3 (2015) 1834–1839 May 11-13, 2015. Ottawa, Canada

1838

6 CONCLUSION AND FUTURE RESEARCH This paper has presented a method for transposing a high number of individual customers into a small number of clusters of customers with similar demand behavior. Forecasting demand for the clusters is a manageable task. Many forecasting strategies use statistical tools to predict future demand of specific customers or products. In the context of our case study, it is impractical to attempt to build customer-specific forecasts due to the large number of customers. We have overcome this by segmenting customers based on behavior. We are then able to build a manageable number of forecast models and apply them within each customer segment.

Fig. 3: Revised Segmentation 4.4 Forecasting Reviewing a sample from one of the resulting clusters reveals a definite pattern. In Fig. 4, you can see that the majority of the customers in this cluster exhibit a seasonal pattern. Fig. 4 also reveals that some customers’ data is significantly different from the main population of the group, but it was not detected as an outlier during the preprocessing stage.

Understand the cause for an overall change in demand allows the BM supplier to decide whether a change in capacity is necessary due to trends, or whether the change is due to seasonal effect and no capacity changes are necessary. In the case study, customers with demand behavior that does not follow a pattern were treated as outliers and removed. Further research is necessary to develop a method to classify these customers and reincorporate them into the model. REFERENCES

Fig. 4: Demand behavior within a cluster 5 DISCUSSION The literature reviewed in section 2.3 above reveals a significant amount of research regarding methods of segmenting data into interesting clusters. And while data preprocessing is generally acknowledged as an important step (Kantardzic, 2011, Miller, 2009, Witten et al., 2011), it is not discussed in detail. During this study, several different clustering techniques, such as K-means, self-organizing maps (SOMs), and fuzzy clustering were explored. With each iteration of preprocessing and clustering it was revealed that the presence of outliers in the data had much greater influence on the results than the algorithm used to create the clusters. Eventually, the data analyst must decide when to stop removing outliers. In Fig 4, some customer data remains that could be categorized as outliers. However, the number of outliers remaining is significantly small enough that the overall behavior of the cluster is not adversely affected and therefore removal is not necessary. Once clusters of customers have been established they can be summarized into a single behavior pattern.

Aburto, L. & Weber, R. 2007. Improved supply chain management based on hybrid demand forecasts. Applied Soft Computing, 7, 136-144. Achabal, D. D., Mcintyre, S. H., Smith, S. A. & Kalyanam, K. 2000. A decision support system for vendor managed inventory. Journal of retailing, 76, 430-454. Altintas, N. & Trick, M. 2014. A data mining approach to forecast behavior. Annals of Operations Research, 216, 3-22. Barajas, M. & Agard, B. 2014. A methodology to form families of products by applying fuzzy logic. International Journal on Interactive Design and Manufacturing (IJIDeM), 1-15. Beccali, M., Cellura, M., Lo Brano, V. & Marvuglia, A. 2004. Forecasting daily urban electric load profiles using artificial neural networks. Energy Conversion and Management, 45, 2879-2900. Berndt, D. J. & Clifford, J. Using dynamic time warping to find patterns in time series. KDD workshop, 1994. Seattle, WA, 359-370. Box, G. E. P. & Jenkins, G. M. 1968. Some recent advances in forecasting and control. Journal of the Royal Statistical Society: Series C (Applied Statistics), 17, 91. Campbell, C. & Ying, Y. 2011. Learning with support vector machines. Synthesis lectures on artificial intelligence and machine learning, viii+82. Carbonneau, R., Laframboise, K. & Vahidov, R. 2008. Application of machine learning techniques for supply chain demand forecasting. European Journal of Operational Research, 184, 1140-1154. Chakraborty, G. 2013. Customer segmentation using sas enterprise miner, Cary, NC, SAS Institute Inc. Chang, P.-C., Fan, C.-Y. & Lin, J.-J. 2011. Monthly electricity demand forecasting based on a weighted evolving fuzzy neural network approach. International

1908

INCOM 2015 Paul W. Murray et al. / IFAC-PapersOnLine 48-3 (2015) 1834–1839 May 11-13, 2015. Ottawa, Canada



Journal of Electrical Power & Energy Systems, 33, 1727. Chen, F., Ryan, J. K. & Simchi-Levi, D. 2000. The impact of exponential smoothing forecasts on the bullwhip effect. Naval Research Logistics, 47, 269-286. Efendigil, T., Önüt, S. & Kahraman, C. 2009. A decision support system for demand forecasting with artificial neural networks and neuro-fuzzy models: A comparative analysis. Expert Systems with Applications, 36, 66976707. Ferbar, L., Čreslovnik, D., Mojškerc, B. & Rajgelj, M. 2009. Demand forecasting methods in a supply chain: Smoothing and denoising. International Journal of Production Economics, 118, 49-54. Fisher, W. 1958. On grouping for maximum homogeneity. American Statistical Association Journal. Forslund, H. & Jonsson, P. 2007. The impact of forecast information quality on supply chain performance. International Journal of Operations & Production Management, 27, 90-107. Han, J. & Kamber, M. 2006. Data mining, southeast asia edition: Concepts and techniques, Morgan kaufmann. Hernández, J. E., Mula, J., Poler, R. & Lyons, A. C. 2014. Collaborative planning in multi-tier supply chains supported by a negotiation-based mechanism and multiagent system. Group Decision and Negotiation, 23, 235269. Holweg, M., Disney, S., Holmström, J. & Småros, J. 2005. Supply chain collaboration:: Making sense of the strategy continuum. European Management Journal, 23, 170-181. Jang, J.-S. R. 1993. Anfis: Adaptive-network-based fuzzy inference system. Systems, Man and Cybernetics, IEEE Transactions, 23, 665-685. Janvier-James, A. M. 2012. A new introduction to supply chains and supply chain management: Definitions and theories perspective. International Business Research, 5, 194-207. Jung, S., Chang, T., Sim, E. & Park, J. 2005. Vendor managed inventory and its effect in the supply chain. In: BAIK, D.-K. (ed.) Systems modeling and simulation: Theory and applications. Springer Berlin Heidelberg. Kantardzic, M. 2011. Data mining : Concepts, models, methods, and algorithms, Hoboken, NJ, Wiley-IEEE Press. Kashwan, K. R. & Velu, C. M. 2013. Customer segmentation using clustering and data mining techniques. International Journal of Computer Theory and Engineering, 5, 856-61. Kaylani, A., Georgiopoulos, M., Mollaghasemi, M., Anagnostopoulos, G. C., Sentelle, C. & Mingyu, Z. 2010. An adaptive multiobjective approach to evolving art architectures. Neural Networks, IEEE Transactions on, 21, 529-550. Kembro, J. & Näslund, D. 2014. Information sharing in supply chains, myth or reality? A critical analysis of empirical literature. International Journal of Physical Distribution & Logistics Management, 44, 179-200.

1839

Kohonen, T. 1990. The self-organizing map. Proceedings of the IEEE, 78, 1464-1480. Kuo, R. 2001. A sales forecasting system based on fuzzy neural network with initial weights generated by genetic algorithm. European Journal of Operational Research, 129, 496-517. Kusiak, A. 2007. Data mining in industrial applications and innovation. ICS News, 17-21. Le, T., Agard, B. & Deveault, S. 2009. Application du data mining à la segmentation du marché des meubles aux états-unis. Eighth Congres International de Genie Industriel. Leung, H. C. Neural networks in supply chain management. Engineering Management Conference, 1995. Global Engineering Management: Emerging Trends in the Asia Pacific., Proceedings of 1995 IEEE Annual International, 1995. IEEE, 347-352. Liao, W. T. 2005. Clustering of time series data—a survey. Pattern Recognition, 38, 1857-1874. Mackay, D. 2003. An example inference task: Clustering. Information Theory, Inference and Learning Algorithms, 284-292. Macqueen, J. Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967. California, USA, 281-297. Madigan, D. n.d. Descriptive modeling. New York, NY: Columbia University. Miller, H. J. H. J. 2009. Geographic data mining and knowledge discovery, Boca Raton, Fla., CRC. Neto, J. C. D. L., Da Costa Junior, C. T., Bitar, S. D. B. & Junior, W. B. 2011. Forecasting of energy and diesel consumption and the cost of energy production in isolated electrical systems in the amazon using a fuzzification process in time series models. Energy Policy, 39, 4947-4955. Pai, P.-F. & Lin, C.-S. 2005. A hybrid arima and support vector machines model in stock price forecasting. Omega, 33, 497-505. R Core Team 2014. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Shapiro, J. F. 2007. Modeling the supply chain, Belmont, CA, Thompson Brooks/Cole. Vachon, S., Halley, A. & Beaulieu, M. 2009. Aligning competitive priorities in the supply chain: The role of interactions with suppliers. International Journal of Operations & Production Management, 29, 322-340. Wang, Z., Ye, F. & Tan, K. H. 2014. Effects of managerial ties and trust on supply chain information sharing and supplier opportunism. International Journal of Production Research, 52, 7046. Witten, I., Farnk, E. & Hall, M. 2011. Data mining, Burlington, MA, Morgan Kaufmann / Elsevier Inc.

1909