European Journal of Operational Research 213 (2011) 340–348
Contents lists available at ScienceDirect
European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor
Interfaces with Other Disciplines
Balancing the fit and logistics costs of market segmentations Marcel Turkensteen a,⇑, Gerard Sierksma b, Jaap E. Wieringa b a b
CORAL, Department of Business Studies, Aarhus School of Business and Social Sciences, Aarhus University, Fuglesangs Alle 4, 8210 Aarhus V, Denmark Faculty of Economics and Business, University of Groningen, P.O. Box 800, 9700 AV Groningen, The Netherlands
a r t i c l e
i n f o
Article history: Received 4 May 2009 Accepted 28 February 2011 Available online 4 March 2011 Keywords: Marketing Segmentation Logistics costs Simulated annealing
a b s t r a c t Segments are typically formed to serve distinct groups of consumers with differentiated marketing mixes, that better fit their specific needs and wants. However, buyers in a segment are not necessarily geographically closely located. Serving a geographically dispersed segment with one marketing mix can increase the logistics costs in the form of high transportation costs and long lead times. This study proposes a segmentation method that balances the fit of a segmentation strategy against the corresponding logistics costs. An application to the problem of segmenting a set of European regions, using consumers’ store attribute preferences as a segmentation basis, suggests segment-specific retail positioning strategies that reflect different decisions about store image attributes such as price, assortment, and atmosphere. This approach designates transnational segments that require acceptable logistics costs and offer the highest possible level of within segment homogeneity. 2011 Elsevier B.V. All rights reserved.
1. Introduction Through market segmentation, companies aim to address the needs of a large, heterogeneous market by dividing it into smaller, more homogeneous segments. If buyers within segments are similar but very different from buyers in other segments, the segmentation achieves good fit. Such a segmentation also satisfies the criterion of responsiveness, meaning that consumers in one segment respond uniquely to marketing efforts targeted at them (Wedel and Kamakura, 2000). Most segmentation studies focus solely on the responsiveness, but in transnational studies in particular, it may not be possible to serve well-fitting but dispersed segments. Several studies confirm the usefulness of transnational segments (e.g., Ter Hofstede et al., 1999), but many companies continue to target nations or groups of nations as separate segments (Salah and Pervel Kathanis, 1994). Companies might ignore transnational segmentation because the resulting segments fail to achieve the requirement of actionability, such that effective marketing mixes can be designed to attract and serve the defined segments (Steenkamp and Ter Hofstede, 2002). If the segments are geographically dispersed, it often becomes difficult to establish a cost effective distribution system. In such cases, it is advisable to investigate whether other
⇑ Corresponding author. Tel.: +45 89486483. E-mail addresses:
[email protected] (M. Turkensteen),
[email protected] (G. Sierksma),
[email protected] (J.E. Wieringa). URLs: http://www.asb.dk/staff/bs/matu.aspx (M. Turkensteen), http:// www.rug.nl/staff/g.sierksma (G. Sierksma), http://www.rug.nl/staff/j.e.wieringa (J.E. Wieringa). 0377-2217/$ - see front matter 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.ejor.2011.02.027
segmentation schemes have slightly worse fit but offer the benefit of much lower logistics costs. Few studies take logistics costs explicitly into account. For example, Ter Hofstede et al. (2002) state that a segmentation of European regions has moderate logistics costs if regions within the same segment are connected. The so-called Normclus cluster methods, as presented by DeSarbo and Grisaffe (1998), directly restrict the maximum distance between pairs of subjects in the same segment to restrict logistics costs. Although these studies appear to present reasonable restrictions on logistics costs, we believe that they ignore important aspects. In particular, such methods fail to clarify the logistics costs of different segmentation methods and whether those costs relate to measures such as the maximum distance. In response, we consider a distribution system in which a central warehouse serves all subjects in a segment using direct shipments (see also, Federgruen et al., 1986). Such distribution systems generally use location allocation models (e.g., Klose and Drexl, 2005; Melo et al., 2009). If the shipments are sufficiently large, this model is preferable to an alternative model in which multiple subjects can be served during each tour (Laporte, 1992). We propose instead an approach that trades off between the fit and the logistics costs of a segmentation method. With a simulated annealing heuristic to find the segmentations, our proposed approach selects a small set of candidate segments, which allows decision makers to make carefully considered, well-supported segmentation choices. Therefore, in Section 3, we discuss the fit, and in Section 4, we outline the logistics costs of segmentation methods. In Section 5, we present an approach for balancing the maximization of the fit
M. Turkensteen et al. / European Journal of Operational Research 213 (2011) 340–348
against the minimization of the logistics costs. Finally, we apply the proposed approach to a case study of European retail outlets. 2. Definitions Two conceptually different types of methods can construct segments: Mixture models and hard clustering methods (Wedel and Kamakura, 2000). Mixture models use separate statistical distributions for each segment to model the preferences of individual consumers (Ter Hofstede et al., 1999); however, they also may return poor locally optimal solutions (Wedel and Kamakura, 2000, p. 88). Hard clustering methods, which assign every consumer to only one cluster, do not rely on statistical distributions and are less sensitive to the pitfall of mixture models. We therefore focus on hard clustering methods. The problem of finding a segmentation is a typical example of a hard clustering problem. We use the following notation to describe the elements of this problem. We assume a set of subjects (which can be consumers but in our application refer to regions) X ¼ f1; . . . ; Ng. For the sake of convenience, we use N instead of jX j to denote the number of subjects (where jYj denotes the cardinality of a set Y). A segmentation S is a finite family of non-empty subsets {A1, . . . , AK} of X . The subsets A1, . . ., AK are called clusters or segments, and the parameter K specifies how many clusters need to be determined. A segmentation S = {A1, . . . , AK} is feasible if it satisfies the following properties: 1. 2. 3. 4.
A # X ; 8k ¼ 1; . . . ; K (and Ak X for K P 2); Sk k¼1;...;K Ak ¼ X ; Ak \ Al = ; for each k, l = 1, . . ., K, k – l; and jAkj P 1, "k = 1, . . ., K.
According to Property 2, every subject in X belongs to a cluster; Property 3 states that no subject belongs to more than one segment, which is in line with hard clustering methods. Property 4 forbids the use of empty clusters.
341
with many clusters can be costly, because it requires a separate marketing mix for each segment. To overcome this disadvantage, we also consider SW from Rousseeuw and November (1987), which takes the similarity between objects in the same segment into account and also incorporates dissimilarities between clusters. For any segmentation S, the SW of subject i in segment Ak, denoted by SW(S, i), reflects the average ‘distance’ to the other subjects in Ak and the average distance of i to subjects in the other segments (Rousseeuw and November, 1987). For subject i 2 Ak S, let a(i) be the average distance of i to all other subjects in Ak, d(i, B) be the average distance of i to all subjects in cluster B 2 S, and bðiÞ ¼ minB–Ak dði; BÞ. The SW of S is denoted and defined by:
SWðSÞ ¼
1 X bðiÞ aðiÞ : N i2X maxfaðiÞ; bðiÞg
Note that 1 6 SW(S) 6 1. A value of SW(S) = 0 indicates a random distribution of objects among clusters, whereas SW(S) = 1 corresponds to a situation in which subjects within the same cluster are identical, but subjects in different clusters are different. Because SW(S) also takes between-cluster dissimilarity into account, its value depends less strongly on the number of clusters. Thus, if we have segmentations with different values of K, the one with the highest SW value is likely to have best fit (Rousseeuw and November, 1987). We exploit the benefits of both MSSC and SW. For different values of K, we optimize the MSSC score to determine an optimal segmentation for each value. We subsequently use the SW to only determine the number of clusters K, because it takes O(n2) time to compute the SW of each segmentation, which is quite time consuming. Moreover, there exist effective cluster procedures that optimize MSSC, but there are, to the best of our knowledge, no algorithms that effectively find cluster solutions that maximize SW. 4. Segmentation logistics costs
3. Measuring segmentation fit We would prefer to contrast the additional revenues earned from targeting customers with a differentiated marketing strategy against the logistics costs associated with such a strategy. Although greater fit has a positive effect on sales (Kumar and Petersen, 2005), the relationship between fit and revenues or sales has not been quantified for segmentations based on perceptions and preferences. Therefore, we develop a different trade-off between fit and the logistics costs that focuses on segmentation fit. We assume that, for a fixed logistics cost, the best fitting segmentation is also the most profitable one. We consider two measures of fit: The commonly used minimum sum-of-squares criterion (MSSC) and the silhouette width (SW). The MSSC score measures, for each segment, the sum of the squared distances to a (fictitious) average subject in that segment. By Huygens’ theorem, this value is equal to the sum of squared distances between each pair of subjects in the segment, divided by the total number of subjects in the segment; see e.g. Hansen et al. (1998). For the distance between two subjects, we use the distance between their attribute scores, not their geographical distance. Computing the MSSC takes little time, and well-established methods exist for optimizing it. However, because the MSSC minimizes within-cluster dissimilarity, the score is lowest when every subject forms a separate cluster. The MSSC score also is decreasing with K, the number of clusters, such that the larger the value of K, the smaller and better the value of the MSSC. However, a segmentation
Although segmentation and logistics modeling often are treated as separate fields, Steenkamp and Ter Hofstede (2002) recognize that large geographical distances within segments are the main cause of high logistics costs of a segmentation scheme. Long distances are particularly undesirable for perishable goods, but they invariably cause long lead times, more inventory, and more variability in orders (Nelson and Toledano, 1979). Davis (1990) finds that on average, the costs of the physical distribution constitute 22% of the total costs of a product. Furthermore, transportation costs account for 62% of the logistics costs in the U.S. grocery sector, and have been rising steadily (GMA, 2005). In European retailing, supply chains traditionally have relied on retailer-controlled regional distribution centers (RDCs) though trends appear to be moving toward European distribution centers (EDCs) and Pan-European supply networks (Euro-CASE, 2001). However, the costs associated with the long distances from a central warehouse to the regions may be much greater than the benefits attained from centralizing and merging the facilities of different segments. We investigate an organization that currently serves countries or large regions as segments using a separate distribution system for each country. With the assumption that the organization wants to retain all its customers across all regions, we construct logistics costs models for segments spread out across multiple regions or nations. The question is whether there are attractive intermediate segmentations that encompass a greater geographic spread than the countries-as-segments segmentations, but less than the
342
M. Turkensteen et al. / European Journal of Operational Research 213 (2011) 340–348
unconstrained segmentation that is optimal in terms of expected consumer benefits. For this analysis, we assume a separate facility is established for each segment, and the distribution systems of segments are entirely separate. Daganzo (2004), see Eq. 4.28 provides the mathematical conditions in which separate distributions systems are preferred, which generally occurs when the items for different segments have widely variant characteristics, such as demand rates. The logistics costs can be determined by the transportation distances from the central location. For any segmentation S, we use a distance to central facility (DCF) measure of a segment A in S to calculate the logistics costs. The DCF of segment A, denoted by DCF (A), is the sum of the distances from the locations of all subjects in segment A to the central facility in A. The logistics costs increase more or less proportionally with the distances from the demand points to their facilities (Daganzo, 2004). Because all regions within one segment are continuously replenished from the central facility, the best location of the facility is the center of gravity of the regions in the segment (Francis and White, 1974). The center of gravity of multiple points in twodimensional space is the location at which the sum of the weighted distances between these points and the center of gravity is minimized, and the weight of a point depends on its demand. We approximate the center of gravity by computing a weighted average over all consumer locations in each segment. For example, assume there are three regions from segment A located in twodimensional space on the coordinates (0, 0), (2, 0), and (1, 3), with demand (weights) 2, 5, and 8, respectively. The (approximate) center of gravity is at 2(0, 0) + 5(2, 0) + 8(1, 3) = (1.2, 1.6). The DCF measure of A is the sum of the distances between the center of gravity and the three demand points. The DCF logistics costs measure of segmentation S in turn is the sum of the DCF values of all P segments in S; DCFðSÞ ¼ k¼1;...;K DCFðAk Þ. With LC(S), we denote the logistics costs of a segmentation scheme S, computed according to the DCF measure. Instead of centers of gravity, existing facilities can be used, which involves a trade-off between opening a new facility closer to consumers or maintaining an existing facility. A similar tradeoff arises when considering merging central facilities of multiple segments into one facility. Transshipments or local warehouses can be taken into account easily in the DCF measure. The central facility should be located at the center of gravity of the transshipment warehouses, weighted according to the demand they serve. A key assumption in the DCF model is that there is one central facility for each segment. In practice, it may be more cost effective to open more than one distribution center (DC) per segment. Alternatively, when the DCs of two segments are near each other, a single DC may serve both segments. The DCF measure does not include these possibilities. Therefore, we subsequently (Section 6.3) relax this assumption to illustrate how our proposed approach can be applied to accommodate cases in which (1) one facility serves more than one segment or (2) a segment has multiple facilities. A decentralized distribution system model would contain multiple local DCs that serve regions in a certain geographic area rather than in a single segment. The consumers in the geographic area, say a country, then could belong to different segments. The logistics costs for the facility likely increase with the number of segments it serves (e.g., Thonemann and Bradley, 2002). Alternatively, a multi-echelon distribution system with local warehouses might serve multiple segments. The local warehouses in turn are served from one or more central EDC-like facilities. We leave these issues for further research that extends our proposed model.
5. Budget constraint approach Ideally, a segmentation strategy provides the highest possible profit, where profit is defined as the expected revenues minus costs. However, direct profit maximization is usually not possible if the segmentation is based on consumer perceptions or preferences, as we noted in Section 3. For the trade-off between lower logistics costs and better fit with consumer preferences, we introduce the Budget Constraint (BC) approach. Our method consists of two main steps. First, we restrict the logistics costs, as measured by the DCF measure and, within the given budget B, try to find the best segmentation. We gradually increase the value of B so that we solve a sequence of clustering problems. Second, we introduce the trade-off between logistics costs and fit to make a selection of the resulting set of candidate segmentations. We discuss the steps of our proposed approach in this section. 5.1. The MSSC clustering problem The clustering problem (CP) for X is to determine segmentation S = {A1, . . . , AK} of X ; the MSSC clustering problem has the MSSC fit value as its objective. Consider the segmentation S = {A1, . . . , AK}. Define xik = 1 if region i belongs to cluster Ak in S, and xik = 0 otherwise. Let fi be the vector of attribute scores of i. The cluster center zk of cluster P Ak is a vector of attribute scores, such that zk ¼ i2Ak fi =jAk j. We denote the geographical distance between two subjects i and j by dij, whereas digk denotes the distance from subject i to the location of the central facility gk of segment Ak. So gk is the geographical center of cluster k, whereas zk is the average with respect to cluster attributes. We solve the following CP for given values of K and B:
min
N X K X
xik kfi zk k2
ð1Þ
xik ¼ 1 i ¼ 1; . . . ; N;
ð2Þ
i¼1 k¼1
s:t:
K X k¼1
K X X
digk xik 6 B;
ð3Þ
i¼1 i2Ak
xik 2 f0; 1g i ¼ 1; . . . ; N;
k ¼ 1; . . . ; K:
ð4Þ
The objective function (1) contains the MSSC score, as discussed in Section 3. The combination of constraints (2) and (4) require each subject to be assigned to precisely one segment, as a consequence of hard clustering. Constraint (3) further requires that the total logistics costs do not exceed B. Due to the constraints on the clusters, this problem is a constrained clustering problem (DeSarbo and Grisaffe, 1998). 5.2. Simulated annealing and constrained clustering problems A method that solves the MSSC clustering problem should satisfy the following conditions: It should be relatively easy to check whether the logistics costs of a candidate solution exceed the maximum allowed amount B. If the maximum logistics costs B are low, it may not be possible to obtain a good starting solution with Ward’s algorithm, for example. The method should be able to produce good solutions, even if no good starting solution is available. Recently, there has been a major development in the application of so-called meta-heuristics to segmentation problems, such as simulated annealing (Brusco et al., 2002; Klein and Dubes, 1989), artificial neural networks (Boone and Roehm, 2002), and
M. Turkensteen et al. / European Journal of Operational Research 213 (2011) 340–348
genetic algorithms (Chiou and Lan, 2001). Jain et al. (1999) provide a good survey review of hard non-overlapping clustering methods. Such methods primarily maximize the responsiveness of segmentations (see Section 1), though there is some room for other considerations, such as minimum segment sizes. The simulated annealing (SA) meta-heuristic satisfies the two key conditions. Our preliminary experiments show that SA returns high-quality solutions for the test instances provided by Milligan and Cooper (1987), even when no good starting solutions are available. This finding holds not only for the regular, well-separated cluster instances, but also for those with statistical noise. These findings confirm of Wedel and Kamakura’s (2000) conclusion that SA does not depend heavily on starting solutions. Moreover, we can easily adapt SA to ensure it returns only segmentations with logistics costs lower than a given budget B. Our SA algorithm is based on the approach adopted by Brusco et al. (2002). It proceeds through a sequence of segmentation solutions and randomly changes the cluster membership of one segment in each current solution. If the new solution has a lower MSSC score, the algorithm proceeds to that solution. Otherwise, the move to the new solution occurs with a probability that depends on the size of the deterioration and the so-called temperature. Initially, the temperature is high and many deteriorating moves are possible. Later in the process, the temperature decreases (for a detailed description of SA, see Van Laarhoven and Aarts, 1987). The performance of SA depends largely on the choice of the parameters: The initial and freezing temperatures, the cooling schedule, and the number of moves prior to stability (Henderson et al., 2003). Van Laarhoven and Aarts (1987) report that the performance of the SA algorithm also depends on the feasible neighborhood (the set of segmentations that are reachable from the current segmentation). Our algorithm can be described as follows: Moves from a current solution to a solution in a neighborhood of the current solution occur by altering the cluster membership of a single element, but only if the solution in this neighborhood satisfies the budget restriction. Other restrictions can easily be accommodated, such as a minimum segment size. The algorithm computes the logistics costs of any new solution and compares it with the user-defined budget value B. The stability parameter STABLE = 500, the initial temperature T0 = 200, and the freezing temperature Tf = 0.0001 were determined empirically for our instances. We choose an exponential cooling schedule with a = 0.9, so the temperature at a next stage is 10% lower than that of the current stage. When the current temperature falls below the freezing temperature, the algorithm terminates. Algorithm 1. Simulated annealing approach for budget constrained clustering NOTATION S segmentation; c(S) MSSC score of segmentation S; LC(S) logistics costs of segmentation S; INPUT B logistics costs budget; K number of segments. PARAMETER VALUES T0 = 200 initial temperature (T0 > 0); a = 0.9 cooling parameter; STABLE = 500 number of tries before stability at temperature T is achieved; Tf = 0.0001 temperature at which the algorithm is frozen.
343
Algorithm 1 (continued) Algorithm 1. Simulated annealing approach for budget constrained clustering MAIN ALGORITHM Generate initial segmentation S0 such that LC(S0) 6 B; T :¼ T0; repeat count :¼ 0; repeat randomly select segmentation S:¼findMove (S0); d = c(S0) c(S); if d P 0 S0 :¼ S; else rnd :¼ random number from U(0, 1); if rnd < exp Td S0 :¼ S; count :¼ count + 1; until count = STABLE; T :¼ a T; until T < Tf; FUNCTION findMove (S0) feasible :¼ false; while not feasible S :¼ perturb S0 as follows: select randomly a subject i 2 X and find k⁄ s.t. i 2 C k in S; assign i to a randomly select cluster Ck with C k 2 fC 1 ; . . . ; C K g n C k ; if LC(S) 6 B feasible :¼ true; return S. OUTPUT (Sub) optimal segmentation S⁄ with MSSC score c(S⁄).
5.3. Choice of candidate segmentations Algorithm 1 uses B and K as its input values. However, we do not know ‘optimal’ values of B and K in advance. To find these, two objectives need to be optimized simultaneously: We want to maximize fit and minimize logistics costs. Commonly used methods for optimizing decision problem with multiple objectives are the -constraint and the weighting methods (Ehrgott and Gandibleux, 2003). The -constraint method maximizes one objective, while it includes a constraint that specifies a minimum quality on the second objective. The required quality on the second objective is then gradually increased. The weighting method puts a weight on both objectives, and then finds the optimal solutions for different weights. Recently, meta-heuristics have been applied to multi-objective problems (Jones et al., 2002). The BC approach can easily incorporate the -constraint method if K is fixed, as the method can include the logistics costs in a constraint and minimize the MSSC score. However, with K varying, the method cannot generate a set of Pareto optimal solutions, as Algorithm 1 minimizes the MSSC measure of fit, whereas the SW measure of fit is used subsequently to account for K. Instead, the BC approach runs Algorithm 1 for multiple combinations of B and K and obtains a segmentation for each combination, and applies the weighting method to further reduce the resulting set of candidate segmentations. First, for each value of B, we choose the value of K that achieves the highest SW (see Section 3), because the number of clusters for
344
M. Turkensteen et al. / European Journal of Operational Research 213 (2011) 340–348
which the SW value is highest should be the most profitable. This results in a candidate segmentation solution for each considered value of B. We further reduce the set of solutions using the weighing method as follows: Let the parameter w denote the relative importance of the logistics costs. For each segmentation S, the cost score F(S ; w):
FðS; wÞ :¼ BFITðSÞ þ w LCðSÞ;
ð5Þ
where, BFIT(S) denotes the badness of fit of segmentation S. We use BFIT(S) = SW(S) and the DCF measure for LC(S). The SW can easily be substituted by other fit measures; however, the MSSC score is not a suitable measure, because it increases with the number of clusters. Because it is hard to determine the expected revenue or profit level when a segmentation is based on preferences and perceptions, we cannot specify a general optimal value of w. Instead, we produce a small set of ‘efficient’ segmentations. A segmentation S⁄ is efficient if there is a value of w such that F(S ; w) P F(S⁄ ; w) for all S – S⁄. To find such a w value, we start with a lowest logistics costs solution, usually the countries-as-segments solution, and set w = 1. Then we decrease the value of w until another segmentation S obtains the lowest score F(S ; w). We continue decreasing w as long as w P 0. The BC approach leaves it to the user to choose a segmentation with both acceptable logistics costs and acceptable fit. 6. European meat outlet segmentation In this section, we describe a segmentation study for European meat outlets, for which we use data from Ter Hofstede et al. (2002). An international retail chain seeks to customize its outlets according to regional consumer behavior differences, while still taking into account the costs of transporting meat products. For each relatively homogeneous group of regions, it plans to develop a separate marketing strategy. 6.1. Store image and spacial contiguity A store tries to gain a unique position in the minds of customers, perhaps by adjusting the retail formula (the collection of store image attributes such as price, assortment, service, atmosphere, and quality) to the needs of a target group of customers. However, consumers in different regions may value the relative importance of these attributes differently. To determine the preferences of customers in various regions, an international market research agency executed a large-scale survey, as previously reported by Ter Hofstede et al. (2002). Questionnaires were mailed to members of a panel in seven countries within the European Union: Germany, the Netherlands, Belgium, France, Spain, Portugal, and Italy. The respondents indicated which retail outlets they visited most frequently and rated those outlets according to their price, quality, service, atmosphere, distance, and variety in meat. These six attributes then can be related to a general opinion of the store, measured on a 1–7 scale. For each region, a regression produces coefficients that reveal the relative importance of each attribute. If the number of respondents in a region is too small for a viable regression, the region borrows characteristics from neighboring regions. The subjects of the segmentation are 123 NUTS-2 regions. To compute the geographical distances (recall that we use regions instead of single customers/ outlets), we use the central locations of the regions, called centroids. 6.2. Computational experiments The BC approach can construct intermediate solutions, with better fit scores than the countries-as-segments approach, at lower
logistics costs than an unconstrained segmentation approach. We used the DCF measure from Section 3 to approximate the logistics costs. The BC approach starts with a logistics budget of 30,000, which corresponds to the segments-as-countries logistics costs, and increases this value to 70,000 in increments of 5,000. The resultant strategies then can be compared with the results from existing alternative methods, namely, the unconstrained hard clustering approach (HU) and the countries-as-segments segmentation (CS). We evaluated all the candidate segmentations on the following criteria (Ter Hofstede et al., 1999): Customer heterogeneity. We measure segmentation fit according to SW (see Section 3). Logistics costs. The logistics costs are estimated using the measures from Section 4. Number of segments. There should be few segments. The BC approach returns four candidate segmentations. One of them, with w P 14.11, corresponds to the countries-as-segments segmentation (CS). As we decrease the value of w, we obtain the BC Options 3, 2, and 1, in order. Options 2 and 3 represent intermediate segmentations between the HU and CS results. In Table 1, we provide values of LC, SW, and MSSC. According to the procedure from Section 5, four segmentation strategies are efficient: The CS segmentation and the BC Options 1, 2, and 3. Existing approaches do not specify budgets for logistics costs B; therefore, we set their values of B in Table 1 to 1. The CS segmentation uses countries as segments, except for a segment that consists of Belgium and the Netherlands and another that contains Spain and Portugal. For this approach, MSSC = 22.96, and SW = 0.0009. Therefore, we can achieve a considerable improvement in solution quality by transferring regions to other segments. The value of DCF equals 26813, the lowest among the considered segmentations. The HU approach uses Ward’s method (Ward, 1963) to obtain a good initial solution, then improves it using SA. In the resulting segmentation, K = 6, MSSC = 9.51, SW = 0.431, and LC = 68797. The logistics costs are the highest among the considered segmentations though. The BC segmentation with low w values has seven segments; specifically, BC Option 1 in Table 1 reveals MSSC = 9.53, SW = 0.433, and LC = 64219. In the case of BC Option 2, with K = 6 and B = 55000, we find that LC = 54399, MSSC = 11.40, and SW = 0.341. Finally, BC Option 3 is the best alternative for 11.53 6 w 6 14.11, with MSSC = 14.47, SW = 0.169, and LC = 39438. Compared with the CS segmentation, BC Option 3 constructs transnational segments in France, Germany, Belgium, and the Netherlands, where the regions are relatively closely located. Fig. 1 depicts the dispersion of the various segmentation options. Fig. 2 summarizes the scores of the segmentations on the fit (SW) and logistics costs (LC) measures. An ideal segmentation would earn an SW score of 1 and an LC score of 0. In general, the lower the LC and the higher the SW, the better the segmentation. Thus, BC Option 1 is better than the HU segmentation, which confirms that a better MSSC score does not automatically lead to a higher SW score. In preliminary experiments, however, we found that the SW and MSSC scores are strongly correlated.
Table 1 Segmentation alternatives. Segmentation
K
B
MSSC
SW
LC
Optimal for
CS HU BC Option 1 BC Option 2 BC Option 3
5 6 7 6 6
1 1 65000 55000 40000
22.96 9.51 9.53 11.40 14.47
0.009 0.431 0.4329 0.3411 0.169
26813 68797 64219 54399 39438
w P 14.11 No value of w w 6 9.35 9.35 6 w 6 11.53 11.53 6 w 6 14.11
M. Turkensteen et al. / European Journal of Operational Research 213 (2011) 340–348
345
Fig. 1. Segmentation options.
Fig. 2. Comparison of logistics costs and fit of segmentation strategies.
6.3. Sensitivity of the BC approach to the number of facilities The DCF measure relies on the assumption of one central facility for each segment, though it may be less costly to use one facility for
multiple segments, or to serve one segment with multiple facilities. The trade-off is between the number of facilities used and the travel distances. If a segment contains multiple facilities, transportation-related costs decrease. If one facility serves
346
M. Turkensteen et al. / European Journal of Operational Research 213 (2011) 340–348
multiple segments, the transportation costs likely increase, but the warehouse costs decrease. In this section, we relax the assumption of a separate facility for each segment. In two sensitivity experiments, we illustrate how our approach can accommodate cases in which (1) one facility serves more than one segment, and (2) a segment has multiple facilities. In our first sensitivity experiment, we allow facilities to serve multiple segments. We use the solutions of the European retail case study for different combinations of K and B. Fig. 3 illustrates the results for K = 4 and 7. The logistics costs when there is one warehouse per segment are indexed to 100% for each K and B. For various values of B, a separate line shows the total sum of the distances, relative to the case in which the number of facilities equals K, as the number of central warehouses decreases from K through 1. The relative distance increase is smallest for K = 7 and for very high B = 100,000. In general, the total distance increases rapidly if the facilities of more than two segments are combined, especially for the segmentations for which B = 30,000 to 60,000. The only case in which combining facilities makes sense is for unconstrained B (B = 100,000 in the graph) and K = 4. The total distance increase is then about 10%. For larger values of K, the transportation costs increase by a much larger percentage, most notably for K = 7. We thus conclude that, for the cases considered, consolidating the closest two central facilities causes only a small increase in transportation costs. However, the total distance increases quite rapidly if the facilities of more than two segments are combined, especially with lower values of B. In our second sensitivity experiment, we allow for multiple warehouses per segment. The trade-off is again between transportation costs and warehouse costs: Adding new warehouses increases the warehouse costs but decreases transportation costs. The savings in transportation costs are largest with the first added warehouse. We also compare the DCF measure (the sum of the distances to one warehouse) with the sum of the distances if there are two warehouses per segment. The optimal locations of p warehouses per segment are those for which the sum of the distances from each demand point to its nearest facility is minimized. This is a location allocation problem, as we noted in Section 1. If there is a discrete set of candidate locations for the p facilities (e.g., existing facilities, see Section 4), the resulting problem is a p-median problem with p P 2, as has been discussed extensively by Klose and Drexl (2005). An extension of the problem, in which there is a continuous set of candidate locations for the facilities, is the multisource Weber problem (see e.g.Brimberg et al., 2000). For this sensitivity experiment, we consider a multisource Weber problem for two facilities, which can be solved effectively with Cooper’s heuristic (Brimberg et al., 2000).
Fig. 4. Savings in kilometers by adding one warehouse.
In Fig. 4, we present the total distance and transportation cost savings achieved when each segment has not one but two warehouses. We compute the sum of the distances with two facilities using Cooper’s heuristic. The savings are largest for K = 4 and a large B. Because the DCF score for B = 100,000 is around 70,000, the transportation cost savings approach 13% for one additional facility. Note that if there is more than one facility per segment, we not only suffer the costs of establishing and maintaining a second facility but also may face additional costs due to reduced economies of scale. If warehouse costs are very small compared with transportation costs, it may be profitable to open more warehouses, in particular for small values of K. However, if it is very expensive to set up a new warehouse, it may be profitable to combine warehouses for large values of B and K. In general, we conclude that the BC approach is robust for small values of B, but less so for larger B values and dispersed segments. 6.4. Restricting the maximum distance within segments Instead of computing the logistics costs of each segmentation directly, we might follow an approach similar to that in Normclus and restrict the diameters of the segments. A segmentation is only feasible if the distance between each pair of subjects in the same cluster is at most D. This distance-constraint (DC) approach gradually increases the value of D. The time required to check whether segments have diameter D is less than the time to check whether the DCF logistics costs remain within the budget B. For our case study, the DC approach needs about two seconds to compute a good segmentation on a Pentium computer with speed 2 GHZ and 256 MB RAM under Windows 2000. The BC approach needs about 200 seconds on the
Fig. 3. Relative total distances for combined warehouses.
M. Turkensteen et al. / European Journal of Operational Research 213 (2011) 340–348 80000
347
for each new segment, as well as from the demand for constructing and implementing a new marketing mix for each segment, new products to be designed and marketed, and the implementation of a new promotional campaign, for example.
70000 60000
However, our methodology also can be applied to a wider class of problems. For example, store location is an emerging topic (e.g., Clarke et al., 1997), and it would be interesting to apply our approach to the problem of locating stores to ensure that the travel times for targeted customers are reasonably low. In a similar application, Yorke (2001) considers the locations of leisure facilities in Britain. These facilities should reflect the desires of children in the neighborhood, but travel times also need to be taken into account.
50000 40000
BC approach
30000 20000
DC approach
10000
8. Conclusions 0 0
0.1
0.2
0.3
0.4
0.5
Fig. 5. Comparison between the BC and DC approaches.
same machine to complete the computation. This difference in computation times may become relevant for large data sets. However, the distance-constraint applies to each individual segment, which prevents any compensation for a high logistics costs segment with a low logistics costs segment. Such a trade-off is possible using the BC approach. As a further comparison Fig. 5 shows the SW values obtained with both methods for K = 4; similar patterns are obtained for other values of K. The DC method thus generates segmentations of good quality at lower LC levels, but it has difficulties generating solutions with intermediate cost levels. However, the DC method appears to be a good alternative if the BC method is too time consuming. 7. Limitations and further research The BC approach determines segmentations for various levels of the logistics costs budget B and number of segments K. The logistics costs and fit then can be weighted, and a small number of candidate segmentations emerges. This approach admittedly has limitations though: The approach is tailored to ‘hard’ segmentations in which every subject is assigned to a single segment, and it is doubtful whether it works well for segmentations that are best solved with fuzzy clustering (Wedel and Kamakura, 2000). An interesting direction for ongoing research would be to develop a mixture modeling approach that limits the logistics costs. The Normclus framework of DeSarbo and Grisaffe (1998) also offers possibilities for constrained fuzzy clustering. Although simulated annealing appears to perform very well, further research should develop an adaptive simulated annealing algorithm (Henderson et al., 2003), which automatically adapts the settings of the algorithm to the particular segmentation instance. The DCF measure of the logistics costs is based on an assumption that each segment is served by a separate facility. Although we also relaxed this assumption (see Section 6.3), an interesting direction for research is to explore the consequences of consolidating facilities from several segments. We only focus on logistics costs. In a practical application of our approach, other costs would need to be taken into account as well. For example, with an increase in K, the fixed costs will be larger. These costs arise from building an additional facility
The costs of physical distribution represent a major portion of the total costs of a product. This contribution significantly influences segmentation decisions about how to form groups of homogeneous customers that can be targeted with the same marketing mix. Steenkamp and Ter Hofstede (2002) report that logistics costs force organizations to maintain a countries-as-segments strategy in retailing and for perishable goods. We propose some new segmentation strategies that acknowledge the trade-off between the consumer benefits of a segmentation and the logistics costs of serving segments. We begin by determining how an increase in spread of consumer locations leads to higher logistics costs. Our study focuses on distribution systems with one central facility that serves all potential customers in a segment. Logistics costs thus relate to the sum of the distances from the central facility to the demand points, the so-called DCF measure. We have conducted sensitivity analysis based on the assumption that only a single central facility may be used, and we find that the DCF measure is accurate and robust for all but the most dispersed segmentations. As our next step, we create segmentations with different levels of logistics costs. The BC approach uses simulated annealing to construct such segmentations. Our simulated annealing algorithm finds good segmentation solutions, independent of the chosen starting solution. We finally apply our proposed approach to a case study involving a European meat outlet that assigns retail formulas to European regions. The approach generates solutions for which both the logistics costs and the segmentation fit are reasonable. References Boone, D., Roehm, M., 2002. Retail Segmentation using artificial neural networks. International Journal of Research in Marketing 19, 287–301. Brimberg, J., Hansen, P., Mladenovic, N., Taillard, E., 2000. Improvements and comparison of heuristics for solving the multisource Weber problem. Operations Research 48 (3), 444–460. Brusco, M., Cradit, J., Stahl, S., 2002. A simulated annealing heuristic for a bicriterion partitioning problem in market segmentation. Journal of Marketing Research 39, 99–109. Chiou, Y., Lan, L., 2001. Genetic clustering algorithms. European Journal of Operational Research 135, 413–427. Clarke, I., Bennison, D., Pal, J., 1997. Towards a contemporary perspective of retail location. International Journal of Retail and Distribution Management 25 (2), 59–69. Daganzo, C., 2004. Logistics Systems Analysis, 4th ed. Springer-Verlag, Berlin. Davis, H., 1990. Distribution Costs and Customer Service Level: How Do You Compare in 1990?. In: Proceedings Annual Conference. Council of Logistics Management, Anaheim, pp. 357–364. DeSarbo, W., Grisaffe, D., 1998. Combinatorial optimization approaches to constrained market segmentation: An application to industrial market segmentation. Marketing Letters 9 (2), 115–134. Ehrgott, M., Gandibleux, X., 2003. Multiple objective combinatorial optimization – A tutorial. In: Tanaka, T., Tanino, T., Inuiguchi, M. (Eds.), Multi-Objective Programming and Goal-Programming. Springer-Verlag, Berlin, pp. 3–19.
348
M. Turkensteen et al. / European Journal of Operational Research 213 (2011) 340–348
Euro-CASE, March 2001. Freight Logistics and Transport Systems in Europe. Federgruen, A., Prastacos, G., Zipkin, P., 1986. An allocation and distribution model for perishable products. Operations Research 34 (1), 75–82. Francis, R., White, J., 1974. Facility Layout and Location: An Analytical Approach. Prentice-Hall, Englewood Cliffs, NJ. GMA, 2005. Transportation Costs Driving Up Production Costs, CPG Industry Survey Finds. Hansen, P., Jaumard, B., Mladenovic, N., 1998. Minimum sum of squares clustering in a low dimensional space. Journal of Classification 15, 37–55. Henderson, D., Jacobson, S., Johnson, A., 2003. The theory and practice of simulated annealing. In: Glover, F., Kochenberger, G. (Eds.), Handbook of Metaheuristics. Kluwer, Dordrecht, pp. 287–319. Ch. 10. Jain, A., Murty, M., Flynn, P., 1999. Data clustering: A review. ACM Computing Surveys 31 (3), 264–323. Jones, D., Mirrazavi, M., Tamiz, M., 2002. Multi-objective meta-heuristics: An overview of the current state-of-the-art. European Journal of Operational Research 137, 1–9. Klein, R., Dubes, R., 1989. Experiments in projection and clustering by simulated annealing. Pattern Recognition 22 (2), 213–220. Klose, A., Drexl, A., 2005. Facility location models for distribution system design. European Journal of Operational Research 162, 4–29. Kumar, V., Petersen, J., 2005. Using a customer-level marketing strategy to enhance firm performance: A review of theoretical and empirical evidence. Journal of the Academy of Marketing Science 33 (4), 504–519. Laporte, G., 1992. The vehicle routing problem: An overview of exact and approximate algorithms. European Journal of Operational Research 59, 345–358. Melo, M., Nickel, S., Saldanha-da Gama, F., 2009. Facility location and supply chain management – A review. European Journal of Operational Research 196, 401–412.
Milligan, G., Cooper, M., 1987. Methodology review: Clustering methods. Applied Psychological Measurement 11 (4), 329–354. Nelson, P., Toledano, G., 1979. Challenges for international logistics management. Journal of Business Logistics 1 (2), 1–21. Rousseeuw, P., 1987. Silhouettes, a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20 (1), 53–65. Salah, H., Pervel Kathanis, L., 1994. Global market segmentation and trends. In: Kaymek, E., Salah, H. (Eds.), Globalization of Consumer Markets: Structures and Strategies. International Business Press, New York, pp. 47–63. Steenkamp, J., Ter Hofstede, F., 2002. International market segmentation: Issues and perspectives. International Journal of Research in Marketing 19 (3), 185–213. Ter Hofstede, F., Steenkamp, J., Wedel, M., 1999. International market segmentation based on consumer-product relations. Journal of Marketing Research 36, 1–17. Ter Hofstede, F., Wedel, M., Steenkamp, J., 2002. Identifying spatial segments in international markets. Marketing Science 21 (2), 160–177. Thonemann, U., Bradley, J., 2002. The effect of product variety on supply-chain performance. European Journal of Operational Research 143, 548–569. Van Laarhoven, P., Aarts, E., 1987. Simulated annealing: Theory and applications. In: Mathematics and its Applications. Kluwer Academic Publishers., Dordrecht. Ward, J., 1963. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58 (301), 236–244. Wedel, M., Kamakura, W., 2000. Market segmentation: Conceptual and methodological foundations. In: International Series in Quantitative Marketing, Second ed. Kluwer Academic Publishers., Dordrecht. Yorke, D., 2001. The definition of market segments for leisure centre services: Theory and practice. European Journal of Marketing 18 (2), 100–113.