Expert Systems with Applications 30 (2006) 313–324 www.elsevier.com/locate/eswa
Integration of self-organizing feature maps neural network and genetic K-means algorithm for market segmentation R.J. Kuoa,*, Y.L. Ana, H.S. Wanga, W.J. Chungb a
Department of Industrial Engineering and Management, National Taipei University of Technology, No. 1, Section 3, Chung-Hsiao East Road, Taipei, Taiwan 106, ROC b Institute of Production Systems Engineering and Management, National Taipei University of Technology, Taipei, Taiwan 106, ROC
Abstract This study is dedicated to proposing a novel two-stage method, which first uses Self-Organizing Feature Maps (SOM) neural network to determine the number of clusters and the starting point, and then uses genetic K-means algorithm to find the final solution. The results of simulated data via a Monte Carlo study show that the proposed method outperforms two other methods, K-means and SOM followed by Kmeans (Kuo, Ho & Hu, 2002a), based on both within-cluster variations (SSW) and the number of misclassification. In order to further demonstrate the proposed approach’s capability, a real-world problem of the fright transport industry market segmentation is employed. A questionnaire is designed and surveyed, after which factor analysis extracts the factors from the questionnaire items as the basis of market segmentation. Then the proposed method is used to cluster the customers. The results also indicate that the proposed method is better than the other two methods q 2005 Elsevier Ltd. All rights reserved. Keywords: Market segmentation; Clustering analysis; Genetic algorithms; Self-organizing feature maps
1. Introduction Market segmentation has been mentioned as an important and avenue of research in the field of electronic commerce (Chang, 1998). In this new and competitive commercial framework, market segmentation techniques can give marketing researchers a leadssssssing edge: because the identification of such segments can be the basis for effective targeting and predicting of potential customers (O’Connor and O’Keefe, 1997). Among the segmentation methods, the post-hoc methods, especially clustering methods are relatively powerful and frequently used in practice (Dillon et al., 1993; Wedel and Kamakura, 1998). Among clustering methods, the K-means method is the most frequently used, since if can accommodate the large sample sizes associated with market segmentation studies (Anil et al., 1997). * Corresponding author. Tel.: C886 2 27712171 ext. 2301; fax: C886 2 27317168. E-mail address:
[email protected] (R.J. Kuo).
0957-4174/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2005.07.036
Due to increasing computer power and decreasing computer costs, artificial neural networks (ANNs) have been recently applied to a wide variety of business areas (Vellido et al., 1999), such as market segmentation (Balakrishnan, Cooper & Jacob, 1996; Kuo et al., 2002a, b), sales forecasting (Kuo & Xue 1998), and bankruptcy prediction (Cadden, 1991). One type of unsupervised neural networks, the Self-Organizing Feature Maps (SOM), can project high-dimensional input space on a low-dimensional topology, allowing one to visually determine out the number of clusters (Lee et al., 1977; Pykett, 1978). Besides, genetic algorithms (GAs) are theoretically and empirically found to provide global near-optimal solutions for various complex optimization problems in the field of operations research. Since GA is effective at searching, it can cluster data according to their similarities. According to some researches (Al-Sultan and Khaled, 1996; Murthy and Chowdhury, 1996, Lin and Shiueng, 1997; 2001; Cowgill, Harvey & Watson, 1999; Krishna & Murty, 1999; Maulik and Bandyopadhyay, 2000), the clustering results of GA are much better than those of the K-means method. Since GA owns such capability, this study tries to modify it for clustering and applies it to the market segmentation.
314
R.J. Kuo et al. / Expert Systems with Applications 30 (2006) 313–324
Therefore, this research proposes a two-stage method. It first determines the number of the cluster via the SelfOrganizing Feature Maps (SOM) neural network. Then a modification of genetic K-means algorithm (GKA) (Krishna & Murty, 1999) is presented and applied to find the final solution (defined as SOMCGKA in this paper). It is compared with the other two methods, including K-means method and the Kuo and colleagues’ modified two-stage method (defined as SOMCK in this paper) (Kuo et al., 2002a) via a Monte Carlo simulation (Milligan & Cooper, 1985). The simulated results show that SOMCGKA provides better performance than the other two methods, based on both within-cluster variations (SSW) and number of misclassifications. In order to further demonstrate the feasibility of the proposed approach, a real-world problem, the market segmentation of the fright transport industry, is employed. In order to segment the customers, a questionnaire is designed according to customers’ benefit variables, and the benefit variables are used as the basis for market segmentation. Then, SOMCGKA is utilized as clustering method for market segmentation. The computational results also indicate that SOMCGKA has the smallest SSW compared with the other two methods. The remainder of this paper is organized as follows. Section Two discusses the general idea of market segmentation, applications of ANNs in marketing segmentation, and GA in clustering analysis. The proposed twostage method is described in Section Three, while Section Four presents the simulation algorithm and results. The realworld problem results are detailed in Section Five. Finally, concluding remarks are made in Section Six.
2. Backgrounds This section introduces some necessary backgrounds, including market segmentation as well as applications of neural networks and genetic algorithms for clustering analysis. Detailed discussion of these topics is in the following subsections. 2.1. Market segmentation Due to multiple and varied requests from different customers, companies must satisfy the needs of discriminating customers who can choose from a multitude of alternatives in the marketplace. To select its markets and serve them well, many companies are embracing target marketing (Kotler, 1997). Kotler (1997) suggested one common three-step approach to identifying the major segments in a market: (1) Survey Stage, (2) Analysis Stage, and (3) Profiling Stage. Wedel and Kamakura (1998) put forward the following six criteria for determining the effectiveness and profitability of market segmentation: (1) identifiability,
(2) substantiality, (3) accessibility, (4) stability, (5) responsiveness, and (6) actionability. They also summarized those segmentation methods in the four categories. 2.2. Applications of Artificial Neural Networks for Market Segmentation The learning algorithms of ANNs can be divided into two categories: supervised and unsupervised. In supervised learning, the network has its output compared with a known answer, and receives feedback about any errors. The most widely applied unsupervised learning scheme is Kohonen’s feature maps. Venugopal and Baets (1994) presented the possible applications of ANNs in marketing management, using three examples, retail sales forecasting, direct marketing and target marketing, to demonstrate the capability of ANNs. Bigus (1996) suggested that ANNs can be employed as a tool for data mining and presented a network with three different dimensions of data: demographic information (sex, age, and marriage), economic information (salary and family income), and geographic information (states, cities, and level of civilization). Balakrishnan, Cooper and Jacob (1994) compared self-organizing feature maps with the K-means method, showing that the K-means method has a higher rate of classification through the Monte Carlo algorithm. Subsequently, Balakrishnan et al., (1996) employed the frequency-sensitive competitive learning algorithm (FSCL) and the K-means method for clustering the simulated data and real-world problem data, presenting a combination of these two methods. Although the simulated nor real-world problem data can determine which single method is better, the combination of these two methods seems to provide better managerial explanation for the brand choice data. A modified two-stage method, which first uses the selforganizing feature maps to determine the number of clusters and the starting points and then employs the K-means method to find the final solution, was proposed by Kuo (Kuo et al., 2002a) for market segmentation. The simulation results show that the modified two-stage method is slightly more accurate than the conventional two-stage method (Ward’s minimum variance method followed by the K-means method) with respect to the rate of misclassification. 2.3. Applications of Genetic Algorithms for Clustering Since the GA is good at searching (Srinivas and Patnaik, 1994), it is used to solve the clustering problem. Murthy and Chowdhury (1996) proposed a GA-based method to solve the clustering problem, and used three experiments on synthetic and real life data sets to evaluate the performance. The results show that the GA-based method may improve the final output of K-means. Al-Sultan and Khan (1996) studied several algorithms, including the K-means
R.J. Kuo et al. / Expert Systems with Applications 30 (2006) 313–324
algorithm, the simulated annealing algorithm, the tabu search algorithm, and the genetic algorithm and compare the performance for the clustering problems from the literature. In general, the result of the K-means algorithm is the worst. Cowgill et al., (1999) used both constrained and unconstrained Monte Carlo simulation datasets to compare the genetic clustering algorithm and the conventional clustering techniques (Ward’s and K-means). The results showed that in some conditions the genetic clustering algorithm outperforms the conventional clustering techniques in terms of an internal criterion. Krishna and Murty (1999) proposed the GKA algorithm for clustering, and proved that the algorithm can converge to an optimal solution. In Ujjwal and Bandyopadhyay’s experiments, the GA-clustering algorithm surpasses the K-means algorithm (Maulik and Bandyopadhyay, 2000). These methods all have the prerequisite that the clusters must be a fixed number, but in real world the number of clusters is unknown. Lin and Shiueng (1997; 2001) proposed a genetic algorithm based approach that can automatically find the proper number of clusters, and the recovery of their algorithm surpasses K-means with the iris data.
315
Set up the data parameters
Data 1
Monte Carlo Simulation Data 2 Simulated data
K-Mean s
SOM+K
SOM+GKA
SOM
SOM
SOM
Find the initial number of clusters
Find the initial clusters’ centroids
Find the initial clusters’ centroids
K-Means
K-Means
GKA
Data 122
Data 242
Data 243
Calculate misclassification rate and SSW
3. Methodology After discussing the general backgrounds of neural networks and genetic algorithms in clustering, this section explains the proposed approach for clustering in two stages. The first stage employs the SOM network to determine the number of clusters and the initial starting point; in the second stage, GKA proposed by Krishna and Murty (1999) is modified in this study for finding the final solution through some generations of evolution in the second stage. For comparison of proposed method with SOM network followed by K-means (Kuo et al., 2002a), using the data from Monte Carlo simulation demonstrates the effectiveness of the proposed approach. Finally, this method is utilized for market segmentation of the fright transport industry. Fig. 1 illustrates the proposed method’s structure. 3.1. Monte carlo simulation The data for this research were generated similar to Milligan’s and Cooper, (1985) procedure, which has been used in several studies that examine the properties of clustering algorithm (Balakrishnan et al., 1994; 1996; Cowgill et al., 1999). Several factors can affect the quality of cluster recovery, e.g., the number of clusters in the data, the number of dimensions used to describe the data, the density level, and the level of errors in the data (Milligan & 1985). The three levels of density in this research are described as follows. 1. The equal condition: The number of points is as close to equal as possible in each cluster.
Evaluate the clustering methods
ANOVA
Conclusions
Fig. 1. Performance evaluation of the proposed method.
2. The 10% condition: One cluster must contain 10% of the data points. 3. The 60% condition: One cluster must contain 60% of the data points. The simulated data sets for this study contain 3, 5, or 7 distinct non-overlapping clusters. The number of dimensions is varied so that all points in a data set are described by a 6, 8, or 10 dimensional space, while the three levels of error are error free, low error and high error, as in Table 1. Therefore, the full factorial design results in 3!3!3!3 combinations, and each experiment is conducted with three Table 1 The experimental design Factor Cluster Dimension Density Error
3 6 0.1 No
5 8 0.5 Low
7 10 0.6 High
316
R.J. Kuo et al. / Expert Systems with Applications 30 (2006) 313–324
replications, so there is a totally 243 data sets, with each set containing 120 data points (Milligan & Cooper, 1985). 3.2. Proposed SOMCGKA algorithm The proposed method is an extension of Krishna and Murty’s GKA. The main differences are that the proposed approach can find the initial clusters’ centroids and then feed them to the GKA to get the final solution as shown in Fig. 2. Each of these two stages is explained in more detail in the following subsections. 3.2.1. Self-organizing feature maps (SOM) The Self-Organizing Feature Map (SOM), which is typical of the unsupervised learning neural networks, can project a high-dimensional input space on a low-dimensional topology so as to allow one to visually determine Initial weights
SOM
Find the shortest distance of winner unit
Reduce the neighbor radius Decrease the learning rate
No
Yes
Input the number of clusters and centroids
GKA Coding Fitness value Reproduction Evolution
Mutation Calculate the clusters’ Centroids Calculate the distance from subgroup to group Calculate the clusters’ centroids Input all patterns Centroids
END
where h is a learning rate that is gradually decreased, xj is the value of input node j, and i* is the winner node. The neighborhood function L(i,i*) is 1 for iZi* and it decrease with distance jri-ri*j between unit i and i* in the output array. A typical choice for L(i,i*) is as follows: jri Kri j2 Lði; i Þ Z exp K (2) 2s2 where s is a width parameter that is gradually decreased with every learing cycle.
The main purpose of clustering algorithms is to cluster the n given samples into k groups with each sample being a d-dimensional vector. The final objective is to get the smallest TWCV (Total Within Cluster Variance), and the variables definitions are as follows: xi: ith sample, xij: ith sample’s jth dimension, where iZ1,2,.,n and jZ 1,2,.,J, wik: ith sample belonging to kth group, where kZ1, 2,.,K, K X
Wik 2f0; 1g;
No
Yes End
Fig. 2. The Flow of SOMCGKA.
Wij Z 1
(3)
k
No
Yes
(1)
(1) SOM neural network is applied to find the initial clusters. (2) The coding method is special. (3) There is no reproduction. (4) The distance-based mutation is used to escape local solutions, and to find the global solution. (5) The GKA uses the K-Means Operator (KMO) for faster convergence. But since the KMO in Krishna and Murty’s GKA is a one-step K-Means algorithm, so the proposed KMO automates the cluster centroids to change the one-step KMO.
Update winner unit and the neighbor nodes’ weights
KMO
Dwij Z ho ði; i Þðxj Kwij Þ
3.2.2. Proposed genetic K-Means algorithms The main differences between the modified GKA proposed in this study is and Krishna and Murty’s GKA (1999) are as follows:
Input a training pattern
Reach the learning cycles
the number of clusters directly (Lee et al., 1977; Pykett, 1978). The most widely used unsupervised learning scheme is the self-organizing feature maps developed by Kohonen (Kohonen, 1982). The learning rule of adjusted weights is as follows:
Ckj: kth group’s center’s jth dimension. The center of kth group is: n P
ckj Z
wik xij
iZ1 n P iZ1
: wik
(4)
R.J. Kuo et al. / Expert Systems with Applications 30 (2006) 313–324
The within-cluster variance of kth group is SðkÞ ðWÞ Z
n X
wik
iZ1
d X
and FðswÞ Z fgðswÞg; if gðswÞR 0
ðxij Kckj Þ2 :
(5)
jZ1
The total within-cluster variance (TWCV) is SðWÞ Z
K X
SðkÞ ðWÞ Z
K X n X kZ1 iZ1
kZ1
wik
d X
ðxij Kckj Þ2 :
(6)
jZ1
The objective is to find out the minimum TWCV defined as SðW Þ Z minfSðWÞg:
(7)
W
3.2.2.1. Coding. A natural way of coding such W into a string, sw, is to consider a chromosome of length n and allow each allele in the chromosome to take a value between {1, 2, ., k}. In this case, each allele corresponds to a pattern and its value represents the cluster number to which the corresponding pattern belongs. A similar coding method is presented by Jones, as shown by Table 2. 3.2.2.2. Initialization. The initial population is selected randomly. Each allele in the population can be initialized to a cluster number randomly selected from the uniform distribution over set {1,2,.,k}. This is avoided by assigning p, the greatest integer which is less than n/k, randomly chosen data points to each cluster and the rest of points to randomly chosen clusters. 3.2.2.3. Selection. The selection operator randomly selects a chromosome from the previous population according to the distribution given by Pðsi Þ Z
317
Fðsi Þ ; N P Fðsi Þ
(8)
(10)
respectively. 3.2.2.4. Mutation. Mutation changes an allele value depending on the distances of the cluster centroids from the corresponding data point. To apply the mutation operator to the allele Sw(i) corresponding to pattern Xi, let djZd(Xi,Cj) be the Euclidean distance between Xi and Cj. Then the allele is replaced with a value chosen randomly from the following distribution: pj Z Pr fswðiÞ Z jg Z
cm dmax Kdj K P
(11)
ðcm dmax Kdi Þ
iZ1
where cmR1 and dmaxZmaxj{dj} 3.2.2.5. K-Means operator. The algorithm with the above selection and mutation operators may take more time to converge. To improve this situation, a modified K-Means Operator (KMO) is introduced, and it is different from that of Krishna and Murty’s (1999) one-step K-Means Operator. The procedures are as follows: 1. Initial setup Given a pre-determined value, 2, randomly select initial samples i (iZ1,2.,m) and the number of clusters, j (jZ1, 2.,c). Besides, set up iZ1 and kZ1 for the first step. (Mj0 : The number of samples being assigned to cluster j) 2. Main procedures Step 1: Calculate the center for cluster j ( 1 X k xi ; if MjkK1 O 0: zj Z M kK1 kK 1 j i2I
(12)
j
iZ1
where F(Si) represents the fitness value of the string Si in the population, as defined in the next paragraph. This kind of random selection uses the roulette wheel. However, the string sw’s fitness function value is based on TWCV. Thus, from Eq. (3.9), the smaller the S(W), the larger the f(sw). Also, f and s represent the current population f(sw)’s average and standard error, respectively. In addition, c is a value in (1, 3). F(sw) and F(sw) are defined as: f ðswÞ ZKSðWÞ; gðswÞ Z f ðswÞKðf Kc$sÞ
(9)
Table 2 GKA coding method Allele Group number
2 2
3 3
If jZc, then go to Step 2; otherwise, let jZjC1 and repeat this step. Step 2: Calculate the distances from each sample to each center. J1k Z
c X X
jjxi Kzkj jj2 :
Setp 3: Calculate the new assignment. Sample i is reassigned to cluster j* (i.e., wij Z 1 and wijZ0 where jZ1, .,c and jsj*). jjxi Kzkj jj2 % jjxi Kzkj jj2 ; j Z 1; .; c; j sj :
1 1
2 2
4 4
5 5
2 2
1 1
4 4
3 3
(13)
jZ1 i2I kK1 j
(14)
If two terms are equal, then stop. If i!m, then repeat this step and let iZiC1; otherwise, for all j, let the number of samples, Mj0 , being assigned to cluster j. Go to next step.
318
R.J. Kuo et al. / Expert Systems with Applications 30 (2006) 313–324
Step 4: Calculate the distance from each sample to each center. J1k Z
c X X
jjxi Kzkj jj2 :
(15)
jZ1 i2I kK1 j
Step 5: If jJ1k KJ2k j! 3, then stop; otherwise, let jZjC1 and kZkC1 and go back to Step 1. Fig. 4. SOM output topology for simulated data 5-8-0.5-N.
3. Determination of initial number of clusters 4.2. Results of SOM There have been various approaches proposed for determining the initial number of clusters. The current study employs SPSS 10.0 for implementation, which Anderberg’s random approach (1973).
4. Simulation Clustering methods have been presented and evaluated in numerous studies (Punj & Steward, 1983, Vriens et al., 1996, and Kuo et al., 2002a, b). Though these techniques are optimal for some specific distributional assumption or dimensionality, further study is still necessary for determining their roubustness to data, which do not satisfy the assumed structure. However, in real-world problems, it is quite difficult to determine which clustering method is the best, since the true, or real, clustering solutions are unknown. Thus, most validation literature tries to solve this problem through Monte Carlo simulation. One of the main advantages of this is that the researcher can use analytical data with a known structure. This section presents the simulation results for three methods through Monte Carlo study proposed by (Milligan & Cooper, 1985). 4.1. Generating simulation data sets According to the simulation algorithm described in the Section 3.1, VBA (Visual Basic Application) for Microsoft Excel 2000 is used to program the algorithm. Discriminant analysis using SPSS 9.0 shows that the clusters can discriminate fairly well for the simulation algorithm.
Fig. 3. SOM output topology for simulated data 3-8-0.5-N.
For SOM network, the number of learning epochs is set as 100 and the training rate is set ot 0.5. The two-dimension output topology is 10!10. The input data for SOM are generated from the simulation algorithm. The error function (Kohonen, 1991) used to evaluate the convergence of SOM is: EZ
P X M qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X ðXi KWij Þ2 ;
(16)
iZ1 jZ1
where P is the number of training patterns, M is the number of output units and Wij is the weights of the winner unit for every training pattern. Figs. 3 to 5 show that the simulation data can be projected to the proper number of clusters by SOM, while the learning curves are depicted in Fig. 6. 4.3. The Results of SOMCGKA SOMCGKA method is programmed using VBA for Microsoft Excel 2000. A population of 50 chromosomes is generated for SOMCGKA method. The average computational time of a generation for SOMCGKA is 48.89 seconds. The SSW value of SOMCGKA does not decrease significantly after 35 generations. Thus, this study sets up the number of generations to be 35. The learning curves are depicted in Figs. 7 for three, five and seven clusters. 4.4. The evaluation of three clustering methods This study compares three methods using 243 simulated data sets by calculating misclassification rate and SSW.
Fig. 5. SOM output topology for simulated data 7-8-0.5-N.
R.J. Kuo et al. / Expert Systems with Applications 30 (2006) 313–324
319
250 385N
Error
200 150
585N
100 785N
50 0
1
10
20
30
40 50 60 Learning Cycle
70
80
90
100
Fig. 6. The learning curves for three different simulated data.
GKA CONVERGE 140000 3
120000
SSW
100000 80000 5
60000 40000 20000
7
10 0
90
80
70
60
50
40
30
20
10
0
0 GENERATION Fig. 7. The learning curves of GKA for 3, 5, and 7 clusters.
To examine the performance of these three clustering methods, SPSS 9.0 is utilized for both the K-means method and the ANOVA test. The performance of the proposed SOMCGKA with lowest misclassification rate, 2.21%, is the best among the three clustering methods, as listed in Tables 3 and 4. In addition, with starting points from SOM really can improve K-means’ performance. For further discussion, five hypothesizes are examined, as follows. Hypothesis 1: The number of misclassifications does not differ across the levels of error. The error factor affecting cluster recovery of three methods is as significant as the number of clusters at 0.05 significance level according to Table 4. The mean misclassifications of three methods increase for the errorfree, low-error, and high-error data sets. Hypothesis 2: The number of misclassifications does not differ across the number of clusters in the data set. According to Table 3, it is shown that the number of clusters affects the cluster recovery of three methods significantly at the 0.05 significance level. As the number of clusters increases, cluster recovery becomes better for each method. Table 4 illustrates this finding. Hypothesis 3: The number of misclassifications does not differ across the number of dimensions of each observation. The number of dimensions does not affect cluster recovery of the three methods according to Table 3. In
Table 4, the cluster ability is the best for SOMCGKA but the worst for SOMCGKA while the number of dimensions is 6. The cluster ability is the best both for K-means and SOMCK as the number of dimensions is 10. Hypothesis 4: The number of misclassifications does not differ across the levels of density. The levels of density affect only K-means according to Table 3. The cluster recovery is almost equally bad for K-means and SOMCK while the levels of density are 60%. The cluster ability is the best for SOMCGKA as the level of density is 10% in Table 4. Hypothesis 5: The number of misclassifications does not differ across the three clustering methods. In Table 4, the misclassifications of K-Means are significantly higher than other cluster methods. But it is shown that the number of misclassifications does not differ
Table 3 The computational results of multivariate analysis of variance for three clustering methods. Factors
K-means
SOMCK
SOMCGKA
Cluster Number Dimension Density Level Error Level
0.000* 0.913 0.003* 0.000*
0.000* 0.736 0.297 0.000*
0.000* 0.478 0.759 0.000*
* the mean difference is significant at the aZ0.05 level.
320
R.J. Kuo et al. / Expert Systems with Applications 30 (2006) 313–324
Table 4 The number of misclassification rates across the factor level (%)
Average Cluster Number
Dimension
Density Level
Error Level
Table 7 The MSW of three clustering method
Level
K-means
SOMCK
SOMC GKA
Clustering methods
K-Means
SOMCK
SOMC GKA
3
0.0877 0.1611
0.0411 0.0744
0.0221 0.0593
Monte Carlo simulation
SSW
46203
54783
53318
47672
5 7 6 8 10 0.1
0.0577 0.0442 0.0893 0.0907 0.0830 0.1063
0.0253 0.0235 0.0460 0.0389 0.0383 0.0493
0.0033 0.0036 0.0285 0.0175 0.0202 0.0186
0.5 0.6 NO LOW HIGH
0.1080 0.0488 0.0308 0.0583 0.1740
0.0417 0.0322 0.0083 0.0215 0.0933
0.0256 0.0219 0.0024 0.0113 0.0525
Table 5 The ANOVA test of three clustering methods (aZ0.05) S.V.
S.S.
SSB SSE SST
2671.854 2 20127.595 240 22799.449 242
DF
MS
F Test
P-Value
1335.927 83.865
15.929
0.000
market segmentation. The summarized flowchart of the implementation is displayed in Fig. 8. 5.1. Questionnaire design The main purpose of this research is to examine how the freight transport company can provide different benefits to the customers according to different market segments. This study surveyed 600 respondents who have used the freight transport service before. The purpose of questionnaire is to understand the needs and expectations of the customers for the fright transport company. The factors influencing customer satisfaction
Degree of importance
Proving cluster effect
Naming each dimension
Market segmentation
Service dimension
2 Test
2
Test
Cluster analysis
The Evaluated effect of segmented importance and satisfaction
The description of segmentation market
Clustering Method I
Clustering Method II
P-Value
K-Means K-Means SOMCK
SOMCK SOMCGKA SOMCGKA
0.000* 0.000* 0.125
Conclusion
Fig. 8. The procedures.
Satisfaction
Table 6 The Scheff’s multiple comparison test (aZ0.025)
Demographic statistics variables
Importance
SOMCGKA is the best method for clustering analysis as shown in Section 4. To further demonstrate the proposed method, an advanced comparison of three methods was made using real-world data for the freight transport industry
Factor analysis
Choice Compared Target
5. Model evaluation results
Cluster analysis
Segmentation characteristics
across the three clustering methods. So, ANOVA test is used, with the results as shown in Table 5. The three clustering methods have significant differences according to Table 5. To examine their performance, SPSS 9.0 is utilized both for K-means method and Scheff’s multiple comparison test. The performance of the proposed SOMCGKA, is the best among the three clustering methods. According to Scheffe’s multiple comparison test, as shown in Table 6, the mean difference between SOMCK and SOMCGKA is not significant at 0.05 significance level. The reason may be that the simulated samples are not vague enough. But the MSW (mean of SSW) of SOMC GKA is smaller than that of SOMCK shown in Table 7.
Degree of satisfaction
R.J. Kuo et al. / Expert Systems with Applications 30 (2006) 313–324
In order to find out the importance of each attribute to the customers, 20 different benefit items are included in the questionnaire. Five rating scales based on Likert’s summated rating scales are used to represent the importance of each item to the customers. These five scales are not very important, not important, normal, important, and very important; and the corresponding scores are 1, 2, 3, 4, and 5, respectively. Also, for freight transport companies’ service satisfactions are also compared. 5.2. Importance analysis The importance analysis classifies different kinds of customer needs in the marketplace, so the freight transport company can design different service combinations based on above results to satisfy different types of customers. First, this study investigates the degree of importance of each service items for the customers. Then we use these degrees of importance as the basis to segment the customers. Finally, we use the Demographic Statistics Variables to discuss the characteristics of each cluster. 5.2.1. Clustering analysis There are twenty important degrees for twenty service items and these degrees are treated as the basis of the market segmentation. For the 243 simulated data sets, there is no significant difference for SOMCK and SOMCGKA as shown in Section 4. However, the SSW of SOMCGKA is smaller than that of SOMCK. Thus, this study will further compare their clustering capabilities. (1) The determination of segmentation number The number of clusters is determined by SOM network. The parameters for the SOM network are the same as those set in Section 4. There are 600 training samples with 20 input data for training. Fig. 9 displays the training result of SOM network, indicating that there are clearly five clusters. (2) Evaluation of three clustering methods After determining the number of clusters using SOM, three methods are used to cluster 600 samples. The withincluster variation (SSW) is also calculated since it is used to
321
Table 8 SSW of three methods Clustering Method
SOMCKMeans
SOMCK
SOMCGKA
SSW
4003.76
3839.113
3796.11
evaluate three methods, K-means, SOMCK, and SOMC GKA. Table 8 shows that SOMCGKA has the smallest SSW, which is identical to the previously experimental result. Thus, SOMCGKA is employed as the clustering tool for market segmentation. 5.3. Satisfaction analysis The main purpose of satisfaction analysis is to understand the customer satisfaction level for each cluster according to each dimension obtained through factor analysis. 5.3.1. Factor analysis The variables of twenty customers’ experiences are employed for factor analysis after the Bartlett-Ball test has determined whether or not the same variance exists for each variable. As a result, the chi-square value and p-value of 3018.168 and 0.000, respectively, are obtained. This implies that the data are suitable for factor analysis. Moreover, the Kaiser-Meyer-Oklin (KMO) analysis is made to see whether or not the sampling numbers and sampling objects are suitable. The evaluation result indicates that the KMO value is 0.927. As Kaiser (SPSS/PCC, 1988) stated, if the KMO value is larger than 0.5, the factor analysis is acceptable. Also, larger KMO value indicates that the result is better. Thus, the factor analysis of this study can provide good results. Next, these twenty customers’ experience variables are used for factor analysis. The principal component analysis extracts the main structure of the factors. A totally of four factors are obtained and the cumulated explanation variance is 54.824%, as listed in Table 9. The factors are further evaluated by Cronbach’s a reliability parameter. The results show that they are all larger than 0.6, and the Cronbach’s a of the questionnaires overall is 0.9086. Though some studies have shown that factor analysis is not necessary for segmentation, however, the computational results of our study clearly Table 9 Results of factor analysis
Fig. 9. Clustering result of SOM.
Factor
Customer experience variables
Explanation variance (%)
Cumulated explanation variance (%)
Cronbach’s a
Factor one Factor two Factor three Factor four
10 4 3 3
21.275 13.502 10.735 9.311
21.275 34.777 45.513 54.824
0.8747 0.7149 0.6685 0.6166
322
R.J. Kuo et al. / Expert Systems with Applications 30 (2006) 313–324
Table 10 Each dimension’s importance with respect to each Cluster Segmentation Dimension
Quick response Cluster
Elasticity Cluster
Service Cluster
Basic Cluster
Safety Cluster
Creditability and Accessibility Reliability Convenience Tangibility
3.89
3.06
4.77
4.24
3.59
3.81 3.84 3.85
3.11 3.14 3.26
4.45 4.83 4.45
4.41 4.63 4.15
4.08 4.27 3.50
Evaluate the Effects of Satisfaction 3.6 3.5
Satisfaction
3.4 3.3 3.2 3.5
3.6
3.7
3.8
3.9
4.0
4.1 3.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
3.0 2.9 2.8 2.7
Importance Fig. 10. Evaluate the Effects of Satisfaction.
show that segmentation with factor analysis can provide better results. 5.3.2. Discussion for customer clusters and factor dimensions Through the above steps, five customer clusters and four dimensions are built. Table 10 illustrates the score of different clusters based on different dimensions.
According to this table, the sequence of importance is convenience, reliability, credit and access, and tangibility dimension. 5.4. Performance Evaluation of the Satisfaction From Fig. 10, we can find out the satisfaction level for each fright transport company compared to the other
Table 11 Descriptions for each cluster Segmentation
Sample
Quick response Cluster
Elasticity Cluster
Service Cluster
Basic Cluster
Safety Cluster
61(14.70%)
51(12.29%)
127(30.60%)
115(27.71%)
61(14.70%)
Benefit Segmentation Variable
Creditability and Accessibility Reliability Convenience Tangibility Sex
3
1
5
4
2
2 2 3 Same
1 1 1 Male
5 5 5 Female
4 4 4 Male
3 3 2 Male
Age Occupation Education Freight Company
21-30 Organization College A
21-30 Organization College A
21-30 Service Industry College B
21-30 Service Industry College B
21-30 Organization College C
Demographic Statistics Variable
PS: the large grade represents important.
R.J. Kuo et al. / Expert Systems with Applications 30 (2006) 313–324
three fright transport companies. This is based on Martilla and James’s (1977) concept. 5.5. Discussion for each cluster According to the mean difference of clusters for each factor and the Chi-Squared independent test of demographic variables and customer usage situations, the attributes that have significant difference are used as suggestions to improve the marketing strategy for the five clusters, shown in Table 11.
6. Conclusions In a clustering problem, it is always difficult to determine the number of clusters. This study shows that the autoclustering feature of SOM is more effective and objective than the K-means method. Thus, SOM can be utilized as the initial stage for market segmentation to determine the number of clusters and starting points. According to Section 4, the proposed GKA-based clustering method, SOMC GKA, using the 243 simulated data sets and the questionnaire survey data is better than both K-means and Kuo’s modified two-stage method (SOMCK-means) (Kuo et al., 2002a). Thus the proposed two-stage method, which first uses the SOM to determine the number of clusters and then employs GKA-based clustering to find the final solution, is a robust clustering method. It can be applied as a new clustering method for market segmentation or other clustering problems. Furthermore, there are some issues worthy of future research. For instance, there are some other ways to generate error-perturbed simulated data sets (Milligan & Cooper 1985), like generating outliers to each cluster, random noise dimensions, and the random noise data sets. In the first part of the proposed method, it is sometimes quite difficult to determine the number of clusters by visual examination, so it may be desirable to apply different unsupervised neural networks, like adaptive resonance theory (ART2) neural network, for further comparison (Wann Thomopoulos, 1997).
Acknowledgements The authors would like to thank the National Science Council, Republic of China for financial support under Contract No. NSC 90-2218-E-027-001.
References Al-Sultan, K. S., & Khan, M. M. (1996). Computational experience on four algorithms for the hard clustering problem. Pattern Recognition Letters, Vol.17, No. 3, March 6, 295–308
323
Anil, C., Carroll, D., Green, P. E., & Rotondo, J. A. (1997). A feature-based approach to market segmentation via overlapping K-centroids clustering. Journal of Marketing Research, Vol. XXXIV (August), 370–377. Balakrishnan, P.V.(Sundar), Cooper, M. C., & Jacob, V. S. (1994). A study of classification capabilities of neural networks using unsupervised learning: A comparison with K-means clustering. Psychometrika, l.59(4), 509–525. Balakrishnan, P.V.(Sundar), Cooper, M. C., Jacob, V. S., & Lewis, P. A. (1996). Comparative performance of the FSCL neural net and K-means algorithm for market segmentation. European Journal of Operational Research, 93, 346–357. Bigus, J. P. (1996). Data mining with neural networks. New york: McGrawHill. Cadden, D. T. (1991). Neural networks and the mathematics of chaos-an investigation of these methodologies as accurate predictors of corporate bankruptcy First international conference on artificial intelligence applications on wall street (pp.582–589). IEEE Computer Society Press. Chang, S. (1998). Internet segmentation:State-of-the-art marketing applications. Journal of Segmentation in Marketing, 2(1), 19–34. Cowgill, M. C., Harvey, R. J., & Watson, L. T. (1999). A genetic algorithm approach to cluster analysis. Computers & Mathematics with Applications, 37(7), 99–108. Dillon, W. R., Kumar, A., & Borrero, M. S. (1993). Capterin individual differences in paired comparisons: an extended BTL model incorporating descriptor variables. Journal of Marketing Research, 30, 42–51. Kohonen, T. (1982). A simple paradigm for the self-organized formation of structured feature maps. In S. Amari, & M. Berlin (Eds.), Competition and cooperation in neural nets, Lecture notes in biomathematics. Berlin: Springer. Kohonen, T. (1991). Self-organizing maps: Optimization approaches. In T. Kohonen, K. Makisara, O. Simula, & J. Kangas (Eds.), Artificial neural networks (pp. 981–990). Amsterdam, The Netherlands: Elsevier. Kotler, P. (1997). Marketing management: analysis, planning, implementation, and control (9th ed). Upper Saddle River, NJ: Prentice Hall. Krishna, K., & Murty, M. (1999). Genetic K-means algorithm. IEEE Transactions on Systems, Man, and Cybernetics-Part: Cybernetics, 29(3), 433–439. Kuo, R.J., Chang, K., & Chien S.Y. (2002) Integration of self-organizing feature maps and genetic algorithm based clustering method for market segmentation. Journal of Organizational Computing and Electronic Commerce, (accepted). Kuo, R. J., Ho, L. M., & Hu, C. M. (2002). Integration of self-organizing feature map and K-means algorithm for market segmentation. International Journal of Computers and Operations Research, 29, 1475–1493. Kuo, R. J., & Xue, K. C. (1998). A decision support system for sales forecasting through fuzzy neural network with asymmetric fuzzy weights. Journal of Decision Support Systems, 24(2), 105–126. Lee, R. C. T., Slagle, J. R., & Blum, H. (1977). A triangulation method for the sequential mapping of points from N-space to two-space. IEEE Transactions on Computers, 26, 288–292. Lin, Y. T., & Shiueng, B. Y. (2001). A genetic approach to the automatic clustering problem. Pattern Recognition, 34(2), 415–424. Lin, Y. T., & Shiueng, B. Y. (1997). Genetic algorithms for clustering, feature selection and classification. International Conference on Neural Networks, 3, 1612–1616. Martilla, John, & James, John (1977). Importance-performance analysis. Journal of Marketing, 41, 77–79. Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33(9), 1455–1465. Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159–179.
324
R.J. Kuo et al. / Expert Systems with Applications 30 (2006) 313–324
Murthy, C. A., & Chowdhury, N. (1996). In search of optimal clusters using genetic algorithms. Pattern Recognition Letters, 17(8), 825–832. O’Connor, G. C., & O’Keefe, B. (1997). Viewing the web as a marketplace: the case of small companies. Decision Support Systems, 21(3), 171–183. Punj, G., & Steward, D. W. (1983). Cluster analysis in marketing research: Review and suggestions for applications. Journal of Marketing Research, 20, 134–148. Pykett, C. E. (1978). Improving the efficiency of Sammon’s nonlinear mapping by using clustering archetypes. Electronics Letters, 14, 799–800. Srinivas, M., & Patnaik, L. M. (1994). Genetic algorithms: a survey. IEEE Computer , 17–26.
Vellido, A., Lisboa, P. J. G., & Vaughan, J. (1999). Neural networks in business: a survey of applications (1992-1998). Expert Systems with Applications, 17(1), 51–70. Venugopal, V., & Baets, W. (1994). Neural networks and their applications in marketing management. Journal of Systems Management, 45(9), 16–21. Vriens, M., Wedel, M., & Wilms, T. (1996). Metric conjoint segmentation method: A Monte Carlo comparison. Journal of Marketing Research, 33, 73–85. Wann, C-D. , & Thomopoulos, C.A. S. (1997). A comparative study of selforganizing clustering algorithm dignet and ART2. Neural Networks, 10(4), 737–753. Wedel, M., & Kamakura, W. A. (1998). Market segmentation: conceptual and methodological foundations. Boston: Kluwer Academic.