Tourism Management 53 (2016) 197e206
Contents lists available at ScienceDirect
Tourism Management journal homepage: www.elsevier.com/locate/tourman
Data mining framework based on rough set theory to improve location selection decisions: A case study of a restaurant chain Li-Fei Chen a, *, Chih-Tsung Tsai b a b
Department of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan Jie Kune Precision Technologies Co., Ltd., Taoyuan City, Taiwan
h i g h l i g h t s A data mining framework was established to support location selection decisions. Rough set theory was applied to predict store performance with location factors. A case of restaurant chain was studied to demonstrate the proposed approach.
a r t i c l e i n f o
a b s t r a c t
Article history: Received 5 May 2014 Received in revised form 29 September 2015 Accepted 1 October 2015 Available online xxx
Location selection plays a crucial role in the retail and service industries. A comprehensive location selection model and appropriate analytical technique can improve the quality of location decisions, attracting more customers and substantially impacting market share and profitability. This study developed a data mining framework based on rough set theory (RST) to support location selection decisions. The proposed framework consists of four stages: (1) problem definition and data collection; (2) RST analysis; (3) rule validation; and (4) knowledge extraction and usage. An empirical study focused on a restaurant chain to demonstrate the validity of the proposed approach. Twenty location variables relevant to five location aspects were examined, and the results indicated that latent knowledge can be identified to support location selection decisions. © 2015 Elsevier Ltd. All rights reserved.
Keywords: Location selection Data mining Rough set theory
1. Introduction Location selection is one of the most critical factors to the success of long-term strategic decisions taken in the restaurant industry. A suitable restaurant location can attract more customers, provide convenient service to customers, and enhance customer loyalty. Moreover, it can decrease the length of the period required to pay for fixed capital investments and increase market share and , & Ryan, profitability (Chou, Hsu, & Chen, 2008; Prayag, Landre 2012; Tzeng, Teng, Chen, & Opricovic, 2002). Therefore, academics and practitioners have focused on examining location decisions. Because of the importance of retail location decisions, a number of models have been developed to address this decision. Central
* Corresponding author. Department of Business Administration, Fu Jen Catholic University, No. 510, Zhongzheng Rd., Xinzhung Dist., New Taipei City, 24205, Taiwan. E-mail address:
[email protected] (L.-F. Chen). http://dx.doi.org/10.1016/j.tourman.2015.10.001 0261-5177/© 2015 Elsevier Ltd. All rights reserved.
place theory, spatial interaction theory, and the principle of minimum differentiation are the most discussed retail location models in the literature (Brown, 1993; Litz & Rajaguru, 2008; Prayag et al., 2012). Although they are normative and require unrealistic assumptions, these models continue to attract considerable academic attention (Brown, 1993; Chou et al., 2008; Prayag et al., 2012). The models provide in-depth information on certain dimensions, however, more extensive perspectives must be considered when modeling a location selection problem (Kuo, Chi, & Kao, 2002). Although simple analytical techniques, such as checklists and analogs, have been available for at least 60 years to support location decisions, most managers still favor their personal experience and ndez & Bennison, 2000). Many studies have instincts (Herna explored the location selection problem by using statistical methods, such as regression, cluster, and factor analysis (Davies, 1973; Rogers & Green, 1979). However, conventional statistical methods require a high level of specialist knowledge regarding model building in general, and they assume that data are normally distributed and exhibit linear relationships to provide meaningful
198
L.-F. Chen, C.-T. Tsai / Tourism Management 53 (2016) 197e206
inferences (Chen, 2014; Coates, Doherty, French, & Kirkup, 1995; ndez & Bennison, 2000). Some studies have applied matheHerna matical programming methods to location selection problems. However, these methods can only use quantitative data and the ndez & modeling process is relatively time consuming (Herna Bennison, 2000; Ho, Chang, & Ku, 2013). Data mining is a powerful tool that can be used to analyze large quantities of data and discover potentially helpful patterns or hidden rules. Data mining is widely used in many fields, but few ndez & researchers have applied it to location selection (Herna Bennison, 2000). Rough set theory (RST) is an effective data mining method that can be used to explain and explore how a decision was made using simple, understandable, and useful rules in the presence of uncertainty and vagueness without requiring the assumptions that are made during regression analysis (Chien & Chen, 2007). The RST method may be more effective than a regressionbased approach to capture the relationship between location factors and store performance. This study aims to develop a data mining framework based on RST to explore store location data for predicting store sales performance. The proposed framework consists of four stages: (1) problem definition and data collection; (2) RST analysis; (3) rule validation; and (4) knowledge extraction and usage. An empirical study focused on a restaurant chain to demonstrate the validity of the proposed approach. The results indicated that latent knowledge can be revealed to identify critical location selection factors, allowing specific location strategies to be derived for selecting optimal locations. 2. Related location selection studies A number of studies have examined the location selection problem. Central place theory was proposed by Christaller (1933), who suggested that a retail location is determined by the range and threshold of a good. The range refers to the maximal distance consumers are willing to travel to obtain a good and the outer limit of a store's market area is determined accordingly. Actual range differences may vary based on individual mobility, price, and customer preferences. The threshold is defined as the minimal amount of demand that must exist in an area for a store to be economically viable; that is, an area should be sufficiently populated to support a store (Craig, Ghosh, & McLafferty, 1984). Craig et al. (1984) indicated that central place theory is the most welldeveloped normative theory related to retail location. However, the main limitation of this theory is that it cannot provide different retail location patterns based on product offerings, store image, and competition levels (Litz & Rajaguru, 2008). Central place theory assumes that consumers shop at the nearest source that provides a required good or service. By contrast, spatial interaction theory (Reilly, 1929; 1931) assumes that consumers trade off the attractiveness of alternative shopping areas against the obstacle effect of distance (Brown, 1993). Using the gravity model, spatial interaction theory treats a number of variables that could influence competing locationsdincluding site related factors, such as store size, distance, price, service levels, and image features, and other attractiveness factors, such as atmosphere and consumer cognitiondas determinants of store location (Teller & Reutterer, 2008). Although it is widely used in practical situations, the main shortcoming of the gravity model is its mathematical complexity (Burnaz & Topcu, 2006; Prayag et al., 2012). Principle minimum differentiation theoryddeveloped by Hotelling in 1929demphasizes the concept of the clustering effect. It suggests that proximity to competitors is an indicator of attractiveness and competitiveness (Chou et al., 2008). Prayag et al. (2012) summarized the reasons why restaurants cluster together.
For example, clustering increases the attractiveness of individual restaurants and an area as a whole; it facilitates comparison of restaurants based on cuisine type; and it enables the sharing of facility and promotion costs. Restaurant clustering also benefits customers. For example, it decreases the search time and costs required to find a suitable restaurant and it provides various choices for customers in a certain area. A wide variety of analytical techniques have been applied to support location decisions. Although simple methods, such as checklists and analogs, have been available for at least 60 years, most managers still solely rely on their personal experiences and instincts (Hern andez & Bennison, 2000). Despite of multivariate statistical techniques, certain studies have managed the location selection problem by using mathematical programming methods (e.g., Jovanovic, 2003; Kolli & Evans, 1999). However, the modeling process is computer- and data-intensive and relatively time consuming. Moreover, this method can only be used to process quantitative data. It cannot incorporate qualitative location criteria; therefore, its applications are limited (Ho, Chang, & Ku, 2013). An increasing number of studies have used multi-criteria decision making models, such as the analytic hierarchy process (AHP) (Chou et al., 2008; Ho et al., 2013; Tzeng et al., 2002), analytic network € process (ANP) (Burnaz & Topcu, 2006; Tuzkaya, Onüt, Tuzkaya, & Gülsün, 2008), PROMETHEE, and the technique for order of preference by similarity to ideal solution (Ishizaka, Nemery, & Lidouh, 2013) to evaluate location selection problems. Researchers have used the AHP and ANP because they can process both qualitative and quantitative criteria. The AHP was developed by Saaty (1980) to determine the relative importance of a set of alternatives in a complex, unstructured and multi-criteria decision problem. There are three basic steps in using AHP: (1) the design of the hierarchy to describe the decision problem; (2) the prioritization of various attributes in each level of the hierarchy by pairwise comparisons; and (3) the integration of the pairwise comparisons to develop the overall evaluation of these alternatives (Partovi, 2001). The ANP technique does not require a strict hierarchical structure as AHP and it can deal with more complicate interdependencies among and between levels of attributes and alternatives (Partovi, 2007). However, it is difficult and time consuming for decision makers to evaluate too many criteria because more pairwise comparisons are needed. Moreover, the accuracy of the results largely depends on the user's experience and knowledge in the area concerned, and this may result in an unreliable analysis (Ravi, Shankar, & Tiwari, 2005; Yurdakul, 2003). Because of advances in information technology, data mining techniques have been applied to the location selection decision. For example, Coates et al. (1995) used an artificial neural network (ANN) and Wang, Chen, and Su (2015) proposed a fuzzyconnective-based aggregation networks method to support location selection decisions. Kuo et al. (2002) proposed an integrated fuzzy ANP and ANN method to select convenience store locations by considering competition, commercial area, convenience, availability, store characteristics and population characteristics. They performed a comparison analysis and found that their method provided more accurate results than a regression model. 3. Rough set theory RST was developed by Pawlak (1982; 1997; 2002). It is a data mining approach used for various purposes, such as feature selection, feature extraction, feature reduction, and extraction of decision rules from data, especially in the presence of uncertainty and vagueness (Chien & Chen, 2007). RST has been applied in various domains, such as quality engineering (Su & Hsu, 2006), human resource management (Chien & Chen, 2007), health care
L.-F. Chen, C.-T. Tsai / Tourism Management 53 (2016) 197e206
(Hassanien & Ali, 2004; Kaya & Uyar, 2013), feature selection (Chen & Chien, 2011), supplier prediction (Tseng, Huang, Jiang, & Ho, 2006), and financial distress prediction (Cao, Wan, & Wang, 2011). However, few studies have examined location selection decisions. This section describes the basic concepts that are required to understand how RST is applied to rule extraction to solve location selection problems, this is illustrated by an example. 3.1. Information system Based on RST, information can be expressed in a decision table. Each row represents an object (e.g., a store's profile) and each column represents an attribute, which can be classified into a condition attribute (e.g., the competition density of a store) and decision attribute (e.g., the sales performance of a store) in the decision table. Knowledge can be described in an information system as S ¼ (U, A, V, f), where U, called the universe, is a nonempty finite set of all objects; A is a nonempty finite set of all attributes; V S is the union of attribute values, such that V ¼ a2A Va , where Va is the attribute value of attribute a; and f denotes an information function, such that for every a2A and ui2U, f ðui ; aÞ2Va . Table 1 shows an example of a simple decision table of six objects, in which U ¼ {S1, S2, S3, S4, S5, S6}. The objects are characterized by three condition attributes and one decision attribute. The three location condition attributes are C1: close to a mass rapid transit (MRT) station, C2: competition density, and C3: store visibility. The decision attribute, D, represents the sales performance of a store; thus, A ¼ {C1, C2, C3, D}. As shown in Table 1, stores S3, S4, and S5 exhibit the “good” decision attribute, which reflects that a store achieved a good sales performance. The other three stores exhibit the “poor” decision attribute, which reflects a poor sales performance. The objective of the table is to classify stores' sales performance according to their location attributes.
For certain specific attributes, objects are indiscernible based on the available information. For example, the values of condition attributes C1 and C2 of stores S1, S4, and S6 in Table 1 are the same; therefore, these three stores are indiscernible if the only information considered was “close to an MRT station” (C1) and “competition density” (C2). Let P be a nonempty subset of set A, that is, PA. The indiscernibility relation is an equivalence relation defined on U. In particular, for xi, xjU, xi and xj are P-indiscernible with respect to P, denoted by IND(P), which can be defined as follows:
Moreover, the family of all equivalence classes defined by the relation, IND(P), is denoted by U/IND(P). Based on indiscernibility relations, the universe can be decomposed into elementary sets and ¼ can be used to construct real world knowledge. For example, the attribute “close to an MRT station” (C1) induces two elementary sets: {S1, S4, S6}, stores that are close to an MRT station, and {S2, S3, S5}, stores that are not close to an MRT station. In other words, attribute C1 divides the universe into two groups, that is, U/IND(C1)¼{{S1,S4,S6},{S2,S3,S5}}. Furthermore, if we consider both the attributes, “close to an MRT station” (C1) and “competition density” (C2), the following elementary sets are generated {S1, S4}, {S2, S5}, {S3}, and {S6}, that is, U/IND({C1,C2})¼ {{S1,S4},{S2,S5},{S3},{S6}}. Finally, considering all three attributes, “close to an MRT station” (C1), “competition density” (C2), and “store visibility” (C3), produced U/IND({C1,C2,C3})¼ {{S1,S4},{S2},{S3},{S5},{S6}}. Similarly, any subset of the attributes can generate elementary sets.
3.3. Lower and upper approximations By using lower and upper approximations, RST can characterize the uncertain concept of relationships among objects. If YU, the lower approximation of Y in P, denoted as PY, is a set that contains all the objects that can be certainly classified into a set based on knowledge from considered attributes. The term PY is defined as follows:
PY ¼ fx2UjU=INDðPÞ4Yg:
(2)
The upper approximation of Y in P, denoted as PY, is a set that contains all the objects that can be possibly classified into a set based on the knowledge of considered attributes. PY can be defined as follows:
PY ¼ fx2UjU=INDðPÞ∩Ys4g:
3.2. Indiscernibility relation and elementary sets
INDðPÞ ¼
199
xi ; xj 2U U f ðxi ; aÞ ¼ f xj ; a ; ca2P :
(1)
That is, xi and xj are P-indiscernible under subset P of the attributes. A set of all indiscernible objects with respect to specific attributes is called an elementary set or an equivalence set. The Pindiscernibility relation induces the P-elementary set in S.
(3)
Moreover, the boundary region of Y in B, denoted as BNP(Y), represents the area that cannot be properly classified using the considered attributes. It is defined by the difference between the two sets of the upper and lower approximations as follows:
BNP ðYÞ ¼ PY PY:
(4)
For example, based on the values of the decision attribute, “sales performance” (D), stores can be classified into two categories, D1 and D2, in Table 1 as follows: D1¼{S1,S2,S6}, denoting stores with “poor” sales performance and. D2¼{S3,S4,S5}, denoting stores with “good” sales performance. If C ¼ {C1, C2, C3}, then U/IND(C)¼{{S1,S4},{S2},{S3},{S5},{S6}}. Because the elements that can be contained by D1 are {S2} and {S6}, the lower approximation of D1 in C, denoted by CD1, is {S2,S6}. Furthermore,
Table 1 An example of decision table for location selection. Stores
S1 S2 S3 S4 S5 S6
Condition attributes
Decision attribute
C1 close to MRT station
C2 competition density
C3 store visibility
D sales performance
no yes yes no yes no
high low high high low low
high low high high high low
poor poor good good good poor
200
L.-F. Chen, C.-T. Tsai / Tourism Management 53 (2016) 197e206
fS1 ; S4 g∩D1 ¼ fS1 ; S4 g∩fS1 ; S2 ; S6 ; g ¼ fS1 gs∅; fS2 g∩D1 ¼ fS2 g∩fS1 ; S2 ; S6 ; g ¼ fS2 gs∅; fS3 g∩D1 ¼ fS3 g∩fS1 ; S2 ; S6 ; g ¼ ∅; fS5 g∩D1 ¼ fS5 g∩fS1 ; S2 ; S6 ; g ¼ ∅; and fS6 g∩D1 ¼ fS6 g∩fS1 ; S2 ; S6 ; g ¼ fS6 gs∅: Therefore, the upper approximation of D1 in C, denoted by CD1 is {S1,S2,S4,S6}. Finally, the boundary region of D1 in C can be derived using BNC(D1)¼CD1 CD1 {S1,S2,S4,S6}e{S1,S4}. As a result, the sales performance of stores S2 and S6 are definitely “poor,” and the sales performance of stores S1 and S4 could be either “poor” or “good,” considering the conditions ”close to an MRT station,” “competition density,” and “store visibility.” Similarly, considering stores with good sales performance, D2¼{S3,S4,S5}, the lower and upper approximations of set D2 with respect to knowledge (C) are CD2¼{S3,S5} and CD2¼{S1,S3,S4,S5}, respectively. Therefore, the sales performance of stores S3 and S5 are definitely “good.” 3.4. Positive region and reduct The positive region is a concept that is essential to RST. Considering sets C, D 4 A, and X 4 U, C and D are the equivalence relations over U. The C-positive region of D represents the set of objects that can be properly classified into D-elementary sets generated by IND(D) using knowledge from C. The C-positive region of D can be defined as:
POSC ðDÞ ¼
∪
X2U=INDðDÞ
CX:
(5)
Moreover, an attribute, q2C, is D-indispensable in C if POSC(D) sPOSC{q}(D); otherwise q is D-dispensable in C. Subset R of C is called the D-reduct of C if and only if R is the D-indispensable subset of C such that POSR(D)¼POSC(D). No superfluous attribute should exist in the reduct. No attribute can be removed from R without changing the equivalence classes. In other words, a reduct is the necessary part of the knowledge that can capture all the basic concepts that are being examined by using the indispensable attributes. Continuing on from the previous example, let C ¼ {C1, C2, C3} to generate elementary sets U/IND(C)¼{{S1,S4},{S2},{S3},{S5},{S6}}. Furthermore, let D ¼ {“sales performance”}, thus U/IND(D)¼ {{S1,S2,S4},{S3,S4,S5}}. Therefore, the C-positive region of D can be computed as
POSC ðDÞ ¼ CD1 ∪CD2 ¼ fS2 ; S6 g∪fS3 ; S5 g ¼ fS2 ; S3 ; S5 ; S6 g: If attribute C2, “competition density,” is removed from set C, then POSCC2(D)¼{S2,S3,S5,S6}¼POSC(D). Therefore, attribute C2 is Ddispensable in C. However, if attribute C3, “store visibility”, is removed from set C, then POSCC3(D)¼{S3,S6}sPOSC(D). Therefore, attribute C3 is D-indispensable in C. Similarly, attribute C1, “close to an MRT station,” is D-indispensable in C. Consequently, set {C1, C3} is a D-reduct of C.
definitely belong to one decision class, which enables a positive decision to be made. By identifying reducts, RST can derive more concise rules by eliminating redundant criteria. According to Pawlak (2002), a decision rule in S is the expression W/Z, which means that if W then Z, where W and Z are the antecedent and consequent of the rule, respectively. The rules generated by the reducts exhibit the same condition and decision values. These rules are also called certain rules. For example, given the reduct, {C1, C3}, stores S3 and S5 have the same condition and decision attribute values. The if-then rule can be derived as follows: IF “close to an MRT station” (C1) ¼ yes AND “store visibility” (C3) ¼ high, THEN “sales performance” (D) ¼ good. Certain indices, including support, confidence, and lift, can be used to validate the quality of the rules derived using RST (Chien & Chen, 2007). The support of the rule, W/Z, in S is defined as follows:
supports ðW/ZÞ ¼ cardðkW∧ZkS Þ;
(6)
where jjW ∧ ZjjS denotes the set of all objects belonging to U that satisfy W ∧ Z in S. The certainty of decision rule W / Z is defined as follows:
certaintyS ðW/ZÞ ¼ cardðkW∧ZkS Þ cardðkWkS Þ
(7)
,where kWkSs0. That is, certainty represents the conditional probability of Z given W. The concept of certainty of the decision rule in RST is similar to that of confidence used in another data mining approach of association rules. It characterizes the frequency of the cases satisfying the condition part among all the cases satisfying the decision part of the decision rule (Chien & Chen, 2007). Finally, lift measures the ratio of the probability that the class occurs in the sample to the probability that the class occurs in the population. It can also be used to assess the value of using the rule compared to the value without using the rule as follows:
liftS ðW/ZÞ ¼ PðZjWÞ=PðZÞ:
(8)
The value of lift is between 0 and infinity: A lift value greater than 1 indicates that W and Z appear more often together than expected. In other word, the occurrence of W has a positive effect on the occurrence of Z. By contrast, a value of lift smaller than 1 indicates that W and Z appear less often together than expected, this implies that the occurrence of W has a negative effect on the occurrence of Z. Last, a value of lift near 1 implies that the occurrence of the rule body has almost no effect on the occurrence of Z. A more qualitative approach can also be used to evaluate the rules (e.g., Abastante, Bottero, Greco, & Lami (2014)). 4. Proposed approach This study proposes a data mining framework based on RST to derive location selection rules by exploring the relationships among location factors and sales performance of stores. As shown in Fig. 1, the proposed framework consists of four stages: (Abastante et al., 2014) problem definition and data collection; (Brown, 1993) RST analysis; (Burnaz & Topcu, 2006) rule validation; and (Cao et al., 2011) knowledge extraction and usage. The detailed steps required by each stage are described in the following sections. 4.1. Stage 1. Problem definition and data collection
3.5. Rule extraction and validation In RST, decision rules, including certain rules and possible rules, can be induced from the concepts of lower and upper approximation sets, respectively. A certain rule implies that certain objects
Understanding a problem and setting objectives is the most critical step required when extracting useful knowledge. Location factors should be carefully studied with the help of domain experts. In addition, crucial location selection factors that might affect store
L.-F. Chen, C.-T. Tsai / Tourism Management 53 (2016) 197e206
Stage 1 Problem Definition and Data collcetion
Stage 2 RST Analaysis
Problem definition
Decsion Table Construction
Study factors of location selection
201
Stage 3 Rules Validation
Confidence Evaluation
Reducts/Rules Generation
OK
Support Evaluation
NO
OK
Discard
Lift Evaluation
NO
OK
Discard
Data Collection
Data Preprocessing
Qualified Rules
Candidate Rules
Stage 4 Knowledge Etraction and Usage
Knowledge Usage
Decision Rules
OK
Domain Experts Joudgement
NO
Discard
Fig. 1. Research framework of the data mining approach for location selection.
sales performance suggested by the management team should be considered to derive practical rules. After defining the relative location attributes, a sampling plan can be developed and data from various stores can be collected accordingly. To ensure a high-quality subsequent analysis, the collected data must be properly processed. Data preprocessing involves various steps, such as examining the data distribution and outliers, managing empty or missing values, transforming data into appropriate formats, and data enrichment. 4.2. Stage 2. RST analysis After the data are prepared, a decision table can be constructed to perform further RST analysis. In this study, location factors were used as condition attributes and store sales performance was used as the decision attribute. Recognized rough set toolkits, such as Rough Set Exploration System (RSES) and ROSETTA, can be applied to generate reducts and rules. The dataset should be randomly divided into a training dataset and testing dataset to build the model and exam the validity of the derived rules, respectively. In particular, the threshold of support (qs) must be set as a preliminary filter for the candidate rules. 4.3. Stage 3. Rule validation The validity of the candidate rules derived from the training dataset must be checked using the testing dataset. In particular,
confidence and lift indices are used to perform this check. Similarly, the confidence level (qc) and lift (ql) thresholds must be defined in advance. The threshold of confidence level is usually defined by users. In fact, there is no really easy way to determine the best threshold of confidence level. Usually, it is done by trial and error (Chen, 2015; Kuo, Chao, & Chiu, 2011). Moreover, a lift value should be greater than one to ensure that the occurrence of condition part of the rule has a positive effect of the occurrence of the decision part of the rule (Chien & Chen, 2007). The candidate rules that are accepted by both the confidence and lift thresholds are the qualified rules that can be further reviewed by domain experts.
4.4. Stage 4. Knowledge extraction and usage To justify the practicality and rationality of the analyzed results, the qualified rules must be reviewed and interpreted with the help of the experience and knowledge of domain experts. Unusual patterns or results that do not accord with common practice should be examined before they are implemented. These processes can be used to apply the discovered knowledge to develop location selection strategies and improve relative management activities. Because the impact of different location factors on sales performance might shift over time, the rules derived from data mining techniques should be periodically reviewed to ensure the validity of the model.
202
L.-F. Chen, C.-T. Tsai / Tourism Management 53 (2016) 197e206
5. Empirical study 5.1. Problem definition and data collection The case study was conducted on a well-known restaurant chain that launched its first store in Taipei, Taiwan in 1993. Providing high-quality beverages and meals at an affordable price, more than 120 branches of the restaurant have opened in Taipei, and this number is increasing. The company increased the number of business locations by using direct management methods as well as providing franchise opportunities. They also expanded their business regions to China, Indonesia, and the Philippines. Expanding the number of stores to suitable locations is one of the most critical decisions to the success of the company. The management is eager to maximize revenue by selecting appropriate store locations. To gain a better understanding of the relationship between the factors of business performance and store location, 33 directly-managed branches of the chain located in Taipei were surveyed by the top manager of the company in 2011. Based on various location theories, and the help of domain experts and the management team, essential location factors, including demographics, market conditions, store expenses, store conditions, and accessibility conditions, were studied and 20 location variables were investigated. In addition, business performance was evaluated based on average monthly turnover. Location factors considered in this study were including (a) “population size”, “population density”, and “household number” (based on central place theory); (b) “household income”, “availability of parking area,” “closeness to MRT station,” “accessibility to bus stations,” “located on the main road”, “convenience to make a Uturn on the crossroad” (based on central place theory and gravity model); (c) “competition density” and “franchise chain density” (based on spatial competition model and continuous location model); and (d) “land cost of the store” and “type of trade area” (base on land use model). Besides, “population growth rate” (Timor & Sipahi, 2005), “population age distribution” (Melaniphy, 1992),
“store size” (Burnaz & Topcu, 2006; Ho et al., 2013), “visibility of the store” (Timor & Sipahi, 2005), and “business hours” (Burnaz & Topcu, 2006) were also considered. “Business growth rate” and “closeness to street corner” were suggested based on management team's experiences. Table 2 shows a list of the variables used to evaluate location selection and the related location theory, literature, or managerial experiences. Data preprocessing was conducted to improve the quality of the data. Some attributes needed to be transformed into appropriate formats to support meaningful analysis. For example, attribute L2, “population density”, was classified into three categories after a discussion with the domain experts. If the population density of the vicinity was lower than 10,000 people/km2, the area was classified as “low population density”; if the population density of the vicinity was between 10,001 and 20,000 people/km2, the area was classified as “medium population density”; and if the population density of the vicinity was higher than 20,000 people/km2, the area was classified as “high population density”. Attribute L8, “competition density”, was also classified into three subclasses. If no competitor was located within 300 m of the store, it was assigned the category, “zero”; if between 1 and 3 competitors were located within 300 m of the store, it was assigned the category, “low density”; if 4 or more competitors were located within 300 m of the store, it was assigned the category, “high density”. After a discussion with the management team, the decision attribute, “sales performance” (D), was classified into three categories: if average monthly turnover was below US$180 per unit area, then it was classified as “poor performance”; if average monthly turnover was between US$180 and US$250 per unit area, then it was classified as “fair performance”; if average monthly turnover was above US$250 per unit area, then it was classified as “good performance”. 5.2. RST analysis and rule validation The information from the 20 location variables relevant to the 33 directly-managed stores listed in Table 2 was then used as
Table 2 List of evaluation variables for location selection problem. Factors
Factor categories
Environmental Demographics factors
Store factors
Location variables
Description
(L1) (L2) (L3) (L4) (L5) (L6) (L7) (L8)
Population in the vicinity Population per km2 in the vicinity Number of household in the vicinity Age distribution of population in the vicinity Growth rate of population in the vicinity Change rate of enterprise number in the vicinity Average annual income per household in the vicinity Number of competitors within 300 m around the store
Population size Population density Household number Population age distribution Population growth rate Enterprise number change rate Household income Competition density
Location theory, literature, or managerial experience
Central place theory Central place theory Central place theory Melaniphy (1992) Timor and Sipahi (2005) Managerial experience Central place theory, gravity model Market Spatial competition model, conditions continuous location model (L9) Franchise chain density Number of franchisees within 300 m around the store Spatial competition model, continuous location model (L10) Type of trade area Commercial area, living area, commercial-living mix area, Spatial competition model, tourist area, industrial area continuous location model Store expenses (L11) Land cost of the store Current assessed land value Land use model Store (L12) Store size The size of the store Burnaz and Topcu (2006); Ho et al. conditions (2013) (L13) Store visibility Visibility of store sign Timor and Sipahi (2005) Accessibility (L14) Business hours Total business hours in a week Burnaz and Topcu (2006) conditions (L15) Availability of parking area Whether parking lot is available within 300 m around the Central place theory, gravity model store (L16) Distance to street corner The distance between the store and the street corner Managerial experience (L17) Closeness to MRTs Whether MRT station exists within 300 m around the store Central place theory, gravity model (L18) Accessibility to bus stations Number of bus stations within 300 m around the store Central place theory, gravity model (L19) Located on the main road Whether the store located on a main road Central place theory, gravity model (L20) Convenience to make a U-turn Whether the drivers can make a U-turn in the crossroads Central place theory, gravity model on the crossroads
The support, confidence, and lift values for Rule QR6 were 2, 1, and 2.2, respectively. Rule QR11 is an example of a rule with three attributes, “availability of parking” (L15), “close to an MRT station” (L17), and “convenient to make a U-turn” (L20), which means that.
1.65 2.20 2.20 2.20 2.20 2.20 2.20 2.20 2.36 2.36 2.36 2.36 2.36 2.36 2.36 0.75 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 8 6 4 3 3 2 2 2 3 2 2 2 2 2 2 No No No
Yes
Good Good Good Good Good Good Good Good Fair Fair Fair Fair Fair Fair Fair Medium High
No No Medium
Large Medium
High
High
No competitor
Large
Large Large
Low Low Low
High
No
Yes
No No
No
Yes
(L17) closeness to MRT station (L15) availability of parking area (L13)Store visibility
Low
Rule Location factors no. Demographics
THEN the store will have a good sales performance.”
Table 3 List of the qualified rules.
AND the visibility of the store sign is low,
(L1) Population size
“IF there is no competitor near the store,
(L2) Population density
(L5) population growth rate
The support, confidence, and lift values for Rule QR1 were 8, 0.75, and 1.65, respectively. Rule QR6 is a rule identified by two attributes, “competition density” (L8) and “store sign visibility” (L13), which means that.
Moderate High
THEN the store will have a good sales performance.”
QR1 QR2 QR3 QR4 QR5 QR6 QR7 QR8 QR9 QR10 QR11 QR12 Large QR13 QR14 QR15
Market conditions
AND the store is located on a main road,
(L12) Store size
“IF a parking area is available near the store,
203
(L8) competition density
Store conditions
Accessibility conditions
(L19) located on a (L20) convenience to main road make a U-turn
Decision attribute
Evaluation indices
condition attributes and sales performance was used as a decision attribute to construct the information table. Of the 33 stores, 70% were randomly selected to be included in the training dataset (23 stores) and the remaining 30% were used in a testing dataset (10 stores). The training dataset was used to generate the reducts and rules by using RSES software (Bazan & Szczuka, 2005). Ninety-four reducts were derived and 323 rules were generated. The criteria for support were established by considering the class distribution of store sales performance, and these criteria were used as the preliminary evaluation methods in this study. The distributions of the sales performance of the stores varied. Among the 33 stores, 12% exhibited poor sales performance; 42% exhibited fair sales performance; and 45% exhibited good sales performance. With some trial and error, the analysis team decided that a rule could be chosen as the candidate rule if at least two objects supported the rule connected to the category, “poor sales performance”, but at least three objects should support the rule connected to the “fair sales performance” and “good sales performance” categories. With the help of domain experts, the rule shortening process, and preliminary screening conducted by using the support criteria, 142 candidate rules were obtained. All stores in the test dataset were then checked to see if they matched the condition portion or both the condition and decision portions of the candidate rules. The total matching numbers were counted to derive confidence and lift for each rule. In this study, a candidate rule with a confidence level (accuracy) greater than 75% and a lift value greater than one was identified as a qualified rule. Finally, 15 qualified rules were obtained and their corresponding support, confidence, and lift values are listed in Table 3. Table A1 provides these qualified rules in IF-THEN form. Eight of the qualified rules were associated with good sales performance and the remaining seven were associated with fair performance. No rule associated with poor sales performance was qualified because of imbalanced data. In fact, the validity of 52 candidate rules could not be evaluated because no store in the test dataset satisfied the condition portion of the candidate rules. The qualified rules can be expressed in the natural and understandable form of if-then statements. Rules QR1, QR6, and QR11 can be used as examples to illustrate this. Rule QR1 in Table 3 is a rule identified by two attributes, “availability of parking” (L15) and “located on a main road” (L19), which means that.
Sales Support Confidence Lift performance
L.-F. Chen, C.-T. Tsai / Tourism Management 53 (2016) 197e206
204
L.-F. Chen, C.-T. Tsai / Tourism Management 53 (2016) 197e206
Table 4 Evaluation of the effect of location attribute levels on store sales performance. Factor categories Factor subcategories
Attributes
Store sales performance Good
Location factors
Condition of demographic (L1) (L2) (L5) Condition of market (L8) Condition of store (L12) (L13) Condition of accessibility (L15) (L17) (L19) (L20)
Population size Population density Population growth rate Competition density Store size Store visibility Available of parking area Closeness to MRT station Located on a main road Convenience to make a U-turn
Attribute effect evaluation Fair
Hit frequency Importance ranking
e Large (2) 2 e Low (2) 2 Moderate (4), High (5) High (2) 11 No competitor (2) e 2 Large (18) Medium (2) 20 Low (8) Medium (2), high (4) 14 Yes (8) No (7) 15 No (2) No (5) 7 Yes (8) e 8 Yes (2) No (6) 8
8 8 4 8 1 3 2 7 5 5
Notes. 1. The numbers inside the parentheses represent the support (number of stores) that satisfied both the attribute level and the result of sales performance. 2. Hit frequency means the total support that correlated to the attribute.
“IF a parking area is not available, AND the MRT station is not near the store, AND it is inconvenient to make a U-turn, THEN the store will have a fair sales performance.” The support, confidence, and lift values for Rule QR11 were 2, 1, and 2.36, respectively. 5.3. Knowledge extraction and usage The 15 extracted rules were reviewed and discussed with domain experts to interpret the rules and generate a strategy. RST analysis was used to identify the following ten attributes that significantly affected sales performance: “population size”, “population density”, “population growth rate”, “competition density”, “store size”, “store sign visibility”, “availability of parking”, “close to an MRT station”, “located on a main road”, and “convenient to make a U-turn”. To evaluate the effect of each attribute level on sales performance, the hit frequency was derived from the total support correlated to each attribute. These results are listed in Table 4. A high hit frequency implied that the attribute influenced the sales performance of more stores. The “store size”, “availability of parking area”, “store visibility”, and “population growth rate” attributes reflected the highest hit frequencies. Therefore, management teams should pay greater attention to these four attributes when making location selection decisions. Stores conditions are crucial. Size does matter and a large store increases sales performance (QR2, QR3, QR5, and QR8). When a store is big, it could indicate that the potential for high sales performance is good, even if the population growth rate of the vicinity is moderate (QR3), the visibility of the store is low (QR5), or no MRT station is located nearby (QR8). The performance of a medium-sized store could easily be affected by accessibility conditions. For example, if a store has neither an available parking area nor is close to an MRT station, a medium store might exhibit fair performance (QR9). In addition, if it is inconvenient to make a U-turn to access a medium store, the store might only perform fairly, even if it is located in an area with a high population growth rate (QR13). One observation associated with store sign visibility was of particular interest. Management teams usually think that a more visible store sign will increase sales performance. However, a more visible store sign did not guarantee good sales performance (QR10,
QR14, and QR15) in this study. In addition, if one of the following conditions is satisfied: (a) the population growth rate in the vicinity is high (QR4); (b) the store is large (QR5); or (c) no competitors are located near the store (QR6); a store can still achieve good sales performance even if their store sign visibility is low. The impact of accessibility conditions was not limited to medium stores. A store with a parking area that was located on a main road is an excellent location (QR1). An area with a high population growth rate where it is convenient to make a U-turn could provide good sales performance (QR7). By contrast, it is difficult for a store to perform well if the accessibility conditions are poor, specifically if a parking area is unavailable, the store is not close to an MRT station, and it is inconvenient to make a U-turn (QR11). In particular, a store located in a densely populated area could achieve a fair sales performance if a parking area is unavailable and it is inconvenient to make a U-turn (QR12). A high population growth rate in the vicinity of a store increases sales performance. A store located in an area with a high population growth rate could achieve a good sales performance even if the visibility of the store sign is low (QR4). Furthermore, a store located in an area with a high population growth rate could achieve a good sales performance if it is convenient to make a U-turn (QR7). However, a medium-sized store could exhibit a fair sales performance if it is inconvenient to make a U-turn (QR13). The management team has recognized the contribution of this study in helping them make more suitable decisions when selecting appropriate store locations. 6. Conclusion Location selection is a crucial decision that must be made in the restaurant industry. Conventional statistical approaches that address the location selection problem are associated with theoretical limitations that might lead to ineffective and misleading inferences. This study proposed a data mining framework based on RST to extract potentially useful rules from location data. The proposed method was illustrated and its validity was demonstrated using a case study of a restaurant chain. RST was applied to predict store performance with location factors. Eschewing the need for the assumptions required by the regression model, a number of simple and understandable rules were derived to support location selection decisions. From the analysis results, the most significant location factors that affect store performance are “store size,” “availability of parking area”, “store visibility”, and “population growth rate of the vicinity”. Therefore, management teams should pay more attention to these four factors when they survey locations
L.-F. Chen, C.-T. Tsai / Tourism Management 53 (2016) 197e206
for new restaurants. The management team found that some of the rules revealed certain patterns and trends that were helpful in identifying possible determinants of success, thereby enabling restaurant chain planners to better understand the planning, development, and site selection relevant to future restaurants. Because of the limitation of control power, this study used only the data of directly-managed stores to perform RST analysis. The data of franchise store was not included in the analysis. Future studies should incorporate more location factors, such as data from geographical information systems, in the model. In addition, more advanced RST, such as dominance rough set theory (Greco, Matarazzo, & Slowinski, 1999; 2001; 2002), and other data mining techniques can be applied to explore location selection decisions. Acknowledgment This work was partially supported by Grants from Ministry of Science and Technology, Taiwan, R.O.C. under grant number NSC 102-2410-H-030-071-MY3. Appendix A. Supplementary data Supplementary data related to this article can be found at http:// dx.doi.org/10.1016/j.tourman.2015.10.001. Appendix
205
Burnaz, S., & Topcu, Y. I. (2006). A multiple-criteria decision-making approach for the evaluation of retail location. Journal of multi-criteria decision analysis, 14, 67e76. Cao, Y., Wan, G., & Wang, F. (2011). Predicting financial distress of Chinese listed companies using rough set theory and support vector machine. Asia-Pacific Journal of Operational Research, 28(1), 95e109. Chen, L. F. (2014). A novel framework for customer-driven service strategies: a case study of a restaurant chain. Tourism Management, 41, 119e128. Chen, L. F. (2015). Exploring asymmetric effects of attribute performance on customer satisfaction using association rule method. International Journal of Hospitality Management, 47, 54e64. Chen, L. F., & Chien, C. F. (2011). Manufacturing intelligence for class prediction and rule generation to support human capital decisions for high-tech industries. Flexible Services and Manufacturing Journal, 23(3), 263e289. Chien, C. F., & Chen, L. F. (2007). Using rough set theory to recruit and retain highpotential talents for semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing, 20(4), 528e541. Chou, T. S., Hsu, C. L., & Chen, M. C. (2008). A fuzzy multi-criteria decision model for international tourist hotels location selection. International Journal of Hospitality Management, 27, 293e301. Christaller, W. (1933). Central places in Southern Germany. In C. W. Baskin (Ed.). translated by C. Baskin, 1966, Englewood Cliffs, NJ: Prentice-Hall. Coates, D., Doherty, N., French, A., & Kirkup, M. (1995). Neural networks for store performance forecasting: an empirical comparison with regression techniques. International Review of Retail, Distribution & Consumer Research, 5(4), 415e432. Craig, C. M., Ghosh, A., & McLafferty, S. (1984). Models of the retail store location process: a review. Journal of Retailing, 60(1), 5e36. Davies, R. L. (1973). Evaluation of retail store attributes and sales performances. European Journal of Marketing, 7(2), 89e102. Greco, S., Matarazzo, B., & Slowinski, R. (1999). Rough approximation of a preference relation by dominance relations. European Journal of Operational Research, 117, 63e83. Greco, S., Matarazzo, B., & Slowinski, R. (2001). Rough sets theory for multicriteria decision analysis. European Journal of Operational Research, 129, 1e47. Greco, S., Matarazzo, B., & Slowinski, R. (2002). Rough approximation by dominance relations. International Journal of Intelligent Systems, 17(2), 153e171. Hassanien, A. E., & Ali, J. M. H. (2004). Rough set approach for generation of classification rules of breast cancer data. Informatica, 15(1), 23e38.
Table A1 List of the qualified rules in “IF-THEN” form. Rule no.
Rule description
Support Confidence Lift
QR1 QR2 QR3
IF a parking area is available near the store, AND the store is located on a main road, THEN the store will have a good sales performance. IF the store size is large, THEN the store will have a good sales performance. IF the population growth rate of the vicinity that the store is located is moderate AND the store size is large, THEN the store will have a good sales performance. IF the population growth rate of the vicinity that the store is located is high AND the visibility of store sign is low, THEN the store will have a good sales performance. IF the store size is large AND the visibility of store sign is low, THEN the store will have a good sales performance. IF there is no competitor near the store AND the visibility of store sign is low, THEN the store will have a good sales performance. IF the population growth rate of the vicinity that the store is located is high, AND it is convenient to make a U-turn, THEN the store will have a good sales performance. IF the store size is large AND it is not close to MRT station, THEN the store will have a good sales performance. IF the store size is medium, AND a parking area is available, AND it is not close to MRT station, THEN the store will have a fair sales performance. IF the visibility of store sign is high, THEN the store will have a fair sales performance. IF no parking area is available near the store, AND it is not close to MRT station, AND it is not convenient to make a U-turn, THEN the store will have a fair sales performance. IF the population size of the vicinity that the store is located is large, AND no parking area is available around the store, AND it is not convenient to make a U-turn, THEN the store will have a fair sales performance. IF the population growth rate of the vicinity that the store is located is high, AND the store size is medium, AND it is not convenient to make a U-turn, THEN the store will have a fair sales performance. IF the visibility of the store is medium, THEN the store will have a fair sales performance. IF the population density of the vicinity that the store is located is low, AND the visibility of the store is high, THEN the store will have a fair sales performance.
8 6 4
0.75 1.00 1.00
1.65 2.20 2.20
3
1.00
2.20
3 2 2
1.00 1.00 1.00
2.20 2.20 2.20
2 3
1.00 1.00
2.20 2.36
2 2
1.00 1.00
2.36 2.36
2
1.00
2.36
2
1.00
2.36
2 2
1.00 1.00
2.36 2.36
QR4 QR5 QR6 QR7 QR8 QR9 QR10 QR11 QR12 QR13 QR14 QR15
References Abastante, F., Bottero, M., Greco, S., & Lami, I. M. (2014). Addressing the location of undesirable facilities through the dominance-based rough set approach. Journal of Multi-Criteria Decision Analysis, Special Issue: Applying MCDA: Challenges and Case Studies, 21(1e2), 3e23. Bazan, J., & Szczuka, M. (2005). Rough set exploration system, version 2.2, logic group, institute of mathematics. Poland: Warsaw University. http://logic.mimuw.edu.pl/ ~rses/. Brown, S. (1993). Retail location theory: evolution & evaluation. International Review of Retail Distribution & Consumer Research, 3(2), 185e229.
ndez, T., & Bennison, D. (2000). The art and science of retail location deHerna cisions. International Journal of Retail & Distribution Management, 28(8), 357e367. Ho, H.-P., Chang, C.-T., & Ku, C.-Y. (2013). On the location selection problem using analytic hierarchy process and multi-choice goal programming. International Journal of Systems Science, 44(1), 94e108. Hotelling, H. (1929). Stability in competition. Economic Journal, 39(153), 41e57. Ishizaka, A., Nemery, P., & Lidouh, k. (2013). Location selection for the construction of a casino in the greater London region: a triple multi-criteria approach. Tourism Management, 34, 211e220. Jovanovic, D. M. (2003). Planning of optimal location and sizes of distribution
206
L.-F. Chen, C.-T. Tsai / Tourism Management 53 (2016) 197e206
transformers using integer programming. International Journal of Electrical Power and Energy Systems, 25, 717e723. Kaya, Y., & Uyar, M. (2013). A hybrid decision support system based on rough set and extreme learning machine for diagnosis of hepatitis disease. Applied Soft Computing, 13, 3429e3438. Kolli, S., & Evans, G. W. (1999). A multiple objective integer programming approach for planning franchise expansion. Computers and Industrial Engineering, 37, 543e561. Kuo, R. J., Chao, C. M., & Chiu, Y. T. (2011). Application of particle swarm optimization to association rule mining. Applied Soft Computing, 11, 326e336. Kuo, R. J., Chi, S. C., & Kao, S. S. (2002). A decision support system for selecting convenience store location through integration of fuzzy AHP and artificial neural network. Computers in Industry, 47(2), 199e214. Litz, R. A., & Rajaguru, G. (2008). Does small store location matter? A test of three classic theories of retail location. Journal of Small Business & Entrepreneurship, 21(4), 477e492. Melaniphy, J. C. (1992). Restaurant and fast food site selection. New Jersey: John Wiley & Sons, Inc. (Press). Partovi, F. Y. (2001). An analytic model to quantify strategic service vision. International Journal of Service Industry Management, 12(5), 476e499. Partovi, F. Y. (2007). An analytical model of process choice in the chemical industry. International Journal of Production Economics, 105(1), 213e227. Pawlak, Z. (1982). Rough sets. International Journal of Computer and Information Sciences, 11(5), 341e356. Pawlak, Z. (1997). Rough set approach to knowledge-based decision support. European Journal of Operational Research, 99, 48e57. Pawlak, Z. (2002). Rough sets, decision algorithms and Bayes' theorem. European Journal of Operational Research, 136(1), 181e189. , M., & Ryan, C. (2012). Restaurant location in Hamilton, New Prayag, G., Landre Zealand: clustering patterns from 1996 to 2008. International Journal of Contemporary Hospitality Management, 24(3), 430e450. Ravi, V., Shankar, R., & Tiwari, M. K. (2005). Analyzing alternatives in reverse logistics for end-of-life computers: ANP and balanced scorecard approach. Computers & Industrial Engineering, 48, 327e356. Reilly, W. J. (1929). Methods for the study of retail relationships. Austin: University of Texas, Bureau of Business Research. Bulletin No. 2944. Reilly, W. J. (1931). The law of retail gravitation. New York: Knickerbrocker Press. Rogers, D. S., & Green, H. L. (1979). A new perspective on forecasting store sales: applying statistical models and techniques in the analogue approach. Geographical Review, 69(4), 449e458. Saaty, T. (1980). The analytic hierarchy process. New York: McGraw-Hill. Su, C. T., & Hsu, J. H. (2006). Precision parameter in the variable precision rough sets model: an application. Omega-The International Journal of Management Science, 34, 149e157. Teller, C., & Reutterer, T. (2008). The evolving concept of retail attractiveness: what makes retail agglomerations attractive when consumers shop at them? Journal of Retailing & Consumer Services, 15, 127e143. Timor, M., & Sipahi, S. (2005). Fast-Food restaurant site selection factor evaluation by the analytic hierarchy process. The Business Review, 4(1), 161e167. Tseng, T. L., Huang, C. C., Jiang, F., & Ho, J. C. (2006). Applying a hybrid data-mining approach to prediction problems: a case of preferred suppliers prediction.
International Journal of Production research, 44(14), 2935e2954. € Tuzkaya, G., Onüt, S., Tuzkaya, U. R., & Gülsün, B. (2008). An analytic network process approach for locating undesirable facilities: an example from Istanbul, Turkey. Journal of Environmental Management, 88, 970e983. Tzeng, G. H., Teng, M. H., Chen, J. J., & Opricovic, S. (2002). Multicriteria selection for a restaurant location in Taipei. Hospitality Management, 21, 171e187. Wang, F. F., Chen, L. F., & Su, C. T. (2015). Location selection using fuzzy-connectivebased aggregation networks e a case study of the food and beverage chain industry in Taiwan. Neural Computing and Applications, 26(1), 161e170. Yurdakul, M. (2003). Measuring long-term performance of a manufacturing firm using the analytic network process (ANP) approach. International Journal of Production Research, 41(11), 2501e2529.
Dr. Li-Fei Chen is an Associate Professor of Department of Business Administration at Fu Jen University, New Taipei City, Taiwan. Dr. Chen received her Ph.D. degree in industrial engineering and engineering management from the National Tsing Hua University, Hsinchu, Taiwan. Her current research interests include customer satisfaction, service quality, operation management, and data mining and its applications. The results of her research have been published in numerous academic journals including Omega - The International Journal of Management Science, Tourism Management, International Journal of Hospitality M ana gement, I E EE Tra nsa ct io ns on Semi co nductor Manufacturing, IEEE Transactions on Electronics Packaging Manufacturing, Expert Systems with Applications, International Journal of Production Research, Intelligent Data Analysis, Neural Computing and Applications, and Total Quality Management & Business Excellence.
Mr. Chih-Tsung Tsai is a sales director for Jie Kune Precision Technologies Co., Ltd. at Taoyuan City, Taiwan. He received his Master degree in MBA Program in Business Management from Fu Jen University, New Taipei City, Taiwan. His research interests include big data analysis and applied automatic optic inspection.