The gamma CUSUM chart method for online customer churn prediction

The gamma CUSUM chart method for online customer churn prediction

Electronic Commerce Research and Applications 17 (2016) 99–111 Contents lists available at ScienceDirect Electronic Commerce Research and Applicatio...

2MB Sizes 205 Downloads 151 Views

Electronic Commerce Research and Applications 17 (2016) 99–111

Contents lists available at ScienceDirect

Electronic Commerce Research and Applications journal homepage: www.elsevier.com/locate/ecra

The gamma CUSUM chart method for online customer churn prediction Ssu-Han Chen Department of Industrial Engineering and Management, Ming Chi University of Technology, 84 Gungjuan Rd., Taishan Dist., New Taipei City 24301, Taiwan, ROC

a r t i c l e

i n f o

Article history: Received 24 August 2015 Received in revised form 3 April 2016 Accepted 3 April 2016 Available online 4 April 2016 Keywords: Customer churn Gamma CUSUM chart Hierarchical Bayesian mixture model Online customers Inter-arrival time Recency

a b s t r a c t Customer churn in online firms is difficult to manage because customers are so fickle. The ability to detect churn in the early stage is something every online firm would wish to achieve. It represents both a potential revenue source and a cost-saving benefit. Churn prediction models attempt to organize customer behaviors, transactions and demographics to reduce the possibility of churn within a given time. However, most current methods depend on high-dimensionally static data analysis and the model parameters are estimated based on the massively customers. A dynamic and customized prediction model at the individual level cannot be achieved. This study proposes a novel mechanism based on the gamma CUSUM chart in which only inter-arrival time (IAT) and recency need to be collected, so that the customized parameters can be estimated for the purpose of individual monitoring. The data in this study are from an online dating website in Taiwan. The gamma CUSUM chart is compared with the exponential CUSUM chart of Gan (1994), CQC-v of Xie et al. (2002) and CQC of Chan et al. (2000). The results show that the accuracy rate (ACC) for a gamma CUSUM chart is 5.2% higher and the average time to signal (ATS) is about two days longer than required for the best CQC-v. Ó 2016 Elsevier B.V. All rights reserved.

1. Introduction Customer relationship management (CRM) has gained increasing attention in modern business management. It helps companies find target customers, retain customers and explore customer values (Berson and Smith, 1999), thereby improving their competitive advantage. Maintaining existing customers is one of the important strategies to enhance corporate profitability. Long-term customers have high and stable spending power (O’Brien and Jones, 1995) and can produce a word-of-mouth effect (Reichheld and Teal, 1996). They can even spontaneously recruit new customers (Oliver, 2010). So they are considered to be the most precious asset of companies. In a mature market, business mainly comes from existing customers, and retaining existing customers can create more value than developing new ones (Coussement and Van den Poel, 2008a; Zorn et al., 2010). Previous studies show the importance of keeping existing customers. New customers do not immediately bring benefits to revenue (Zeithaml et al., 1996). The success rate for retaining existing customers is 60%, which is double the success rate for developing new ones (Kotler, 2001). The cost of acquiring new customers can be as much as twelve times higher than keeping existing customers (Torkzadeh et al., 2006). Reducing the customer churn rate by 5% can produce an improvement of more than 25% in profits

E-mail address: [email protected] http://dx.doi.org/10.1016/j.elerap.2016.04.003 1567-4223/Ó 2016 Elsevier B.V. All rights reserved.

(Rechinhheld and Sasser, 1990; Peppers and Rogers, 2000). Coyles and Gokey (2005) have also shown that businesses can generate ten times the value by halting customer if they respond to small changes in consumer behavior. Customer churn is a marketing-related term which means customers defect to another supplier or purchase less (Coussement and Van den Poel, 2008b). As existing customers are an important source of business profits, being able to identify customers who show signs that they are about to leave can create more profits for businesses. This is especially important for online customers, as the phenomenon of customer churn appears to be very rapid and difficult to grasp (Peng et al., 2013). If companies cannot take measures to retain customers before their status deteriorates, the customers may never come back, resulting in wasted investment and loss of future earnings. A timely retention strategy can keep customers, and it is the best way to retain customers. Constructing a prediction mechanism to monitor customer churn is an important step for business development, and it has become a popular topic in the last ten years (Coussement and Van den Poel, 2008a; Tsai and Lu, 2009; Huang et al., 2010; Verbeke et al., 2012; Coussement and De Bock, 2013; Faris, 2014). In general, researchers use the information on customers’ previous behaviors from a database, such as background demographics, transaction records and interactions. All this information is quantified into variables and used to create prediction models to predict the likelihood that customers might be lost in the future.

100

S.-H. Chen / Electronic Commerce Research and Applications 17 (2016) 99–111

However, the previous research studies have conducted static data analysis of customer churn at some particular cut-off time. This makes it necessary to convert longitudinal data into static data through aggregation or rectangularization (Chen et al., 2012). The most significant drawback of static data analysis is that it cannot provide dynamic monitoring of customer churn. Given the extremely volatile nature of online markets, it would seem to be desirable to adopt a more dynamic approach. In addition, the results of repeated static data analysis are displayed in massive tables, and engineers need to make considerable efforts to analyze the predictive differences at every cut-off time before compiling reports for administrators. Most importantly, the parameters or weights in the prediction models are based on the average behavior of customers, so that strictly speaking they cannot be customized to generate prediction models that function effectively at the individual level. A gamma cumulative sum (CUSUM) control chart is different from the previous churn prediction mechanisms in that it can simultaneously achieve longitudinal analysis and visualization for monitoring purposes. In the past, a few studies have used control charts to predict customer churn (Pettersson, 2004; Jiang et al., 2007; Samimi and Aghaie, 2008), and the present author believes that the under-use of this approach is mainly due to the difficulty of selecting analytical fields. Many previous studies analyze the newspaper, cable TV and financial industries in which the customers usually sign contracts for a number of years. After signing their contracts, customers seldom contact the companies unless they are complaining or asking for service changes. In addition, some studies analyze the customer churn of industries selling tangible products. However, customers may not return to make another purchase for a long time after purchasing a tangible product, resulting in the lack of personal shopping history and the difficulties in building customization parameters and monitoring mechanisms. This study analyzes Taiwan’s ‘‘Internet industry” in which companies and customers have no signed contracts, trading of tangible products or face-to-face transactions. It is related to the virtual features of the Internet. Customer visits are produced by the website content, and if the service has a persistent appeal, customers will keep coming back and browsing. Taiwanese people spend four hours online per day on average, and the amount of data generated by online activities is sufficient to build a customized prediction model for customer churn. The model proposed in this study has the following characteristics: (1) Visualized management by exception. This study employs control charts, as they can additional provide dynamic monitoring of customer behaviors and present the monitoring process in a friendly graphical interface rather than just generate static reports. Control charts can immediately issue warnings when customer behaviors deviate from the previously active status. This is management by exception, in which attention is given only when necessary, further reducing the burden on administrators. (2) Longitudinal data analysis. Unlike the previous models in which churn prediction is based on tens of characteristics of customer behavior, this study performs modeling based only on the inter-arrival time (IAT) and the longitudinal data of recency. IAT defines the time difference between two consecutive events while recency analyzes the time difference between the cut-off time and the last event. The two variables are complementary in determining customer status. IAT can usually be used to show the trail of historical behavior, especially when the last event and the cut-off time are far apart. In contrast, recency only shows the behavior at

the present, and cannot detect past behavior. As the two variables are both the data of a time interval, it is feasible to combine the two. The time interval data of the combined variables are converted into time-series data which can be used in longitudinal analysis of customer login. (3) Customized monitoring mechanism. Another feature of the method developed in this study is the combination of control charts and a mixture Bayesian hierarchical model. This model uses gamma distributions to estimate the active and inactive login behaviors of an individual customer and then uses inverse gamma distributions to capture heterogeneity, thereby defining the gamma CUSUM parameters for each customer and implementing model customization and automation of parameter estimates. The rest of this paper is organized as follows. In Section 2, related research of customer churn, CUSUM charts and mixture Bayesian hierarchical models is surveyed. In Section 3, our research methodology is described and explained. In Section 4, a sensitivity analysis is conducted to find the best parameters of the method and experimental results are presented. Concluding remarks and further suggestions are discussed in Section 5. 2. Literature review In this section, some related research is surveyed, which is used to build the bases of proposed method. The survey of the customer churn techniques makes it clear that the area of longitudinal prediction has not yet been adequately investigated. The review of CUSUM charts helps in the selection of a suitable chart which offers a good fit for time-interval datasets. Finally, the idea of handling customer heterogeneity is inspired by several mixture Bayesian hierarchical models. 2.1. Prediction models for customer churn Customer development is difficult and costly. Attracting new customers requires advertising and promotion expenses, in addition to identity authentication and credit checks. There are reactive and proactive approaches to managing customer churn (Burez and Van den Poel, 2007). The former passively creates incentives to retain customers as they terminate contracts or service relationships. The latter constantly monitors customer status and prevents potential losses from happening by giving immediate incentives. In general, a proactive approach has more potential benefits because analysis after the occurrence produces delays in responding, and it is often more expensive to retain the customer who have already decided to leave (Chen et al., 2012). In the longerterm, advances in information technology mean that all business will gradually shift from being reactive on the basis of data management to being proactive, focusing on exploring data. By being proactive, before the deterioration of customer behavior, reminders can be sent out to take retention measures as early as possible. Table 1 summarizes customer churn prediction models reported in the literature in recent years. The distinctive characteristics of each study in terms of the references, fields, variables, data types, methods, and churn definitions are provided. In the columns of variables, the main categories of variables are listed, and the corresponding number of items is in brackets. In the column of data types, the symbol C is cross-sectional, while L is longitudinal analysis. Table 1 shows that the issue of predicting customer churn has been widely discussed in various fields. These include

Table 1 Customer churn prediction models. Fields

Variables

Data

Methods

Churn def

Pettersson (2004) Van den Poel and Lariviere (2004)

Telecom Banking and insurance services Retailing Wireless telecom Pay-TV subscription

Outgoing traffic flows (5) Behavior (8), demographic (5), macro-environment (2)

L C

Control charts PHM

Not mentioned Customer closes all accounts.

Behavior (29), demographics (6) 44 variables adopted Subscription info (6, 10), demographics (6), financial (4), contacts (3) Monthly telephone usage Behavior (4), demographics (5) Client interaction (14), renewal (6), demographics (5), subscriptions (7) Multiple subscription services (13) Not clear Current transactions used

C C C

RF, LR, ANN Ensemble learners RF, LR, and LR with MC

Customer shift transaction pattern User until sampling date Subscriber does not renew or pay

L C C

Control charts Majority voting RF, LR, SVM

Customer stops doing business Not mentioned Subscriber not renewed after a date

L C

Control charts LR, DT, ANN, classifiers

C C

ANN LR and GAM

Not mentioned Customer lifetime value for retention action profitable User out exist after sampling date. Subscriber not renew after a date

C

DT, ANN, and SVM

User shifts to a competitors

L C

Discrete SVM DT, ANN, and SVM

Not mentioned Customer stops doing business

C

DT, GAM, ensemble learners DT, ANN, and PSO DT DT, LR, ANN, SVM, HMM

Gambler does not play during a period

Buckinx and Van den Poel (2005) Lemmens and Croux (2006) Burez and Van den Poel (2007) Jiang et al. (2007) Kumar and Ravi (2008) Coussement and Van den Poel (2008a, b) Samimi and Aghaie (2008) Glady et al. (2009)

Telecom Credit cards Newspaper subscription Telecom Retail banking

Tsai and Lu (2009) Coussement et al. (2010)

Wireless telecom Newspapers

Huang et al. (2010)

Land-line telecom

Orsenigo and Vercellis (2010) Yu et al. (2011)

Telecom cross-selling E-commerce

Coussement and De Bock (2013)

Online gambling

Not mentioned Client/interaction (12), renewal (5), subscriber info (2), subscription info (5) Demographics (4), grants (1), accounts, (6), call, (6), orders (2), segments (1), payment, bills (3), and phone info (2) Not mentioned Customer register (9), login (4), transaction (9), and web log (5) Behavior (27), demographics (3)

Faris (2014) Hadiji et al. (2014) Runge et al. (2014)

Wireless telecom Online games Online games

Calls (10), orders (1) Login (7), transactions (4) Login (4), transactions (2)

C C C/L

User leaves company Player with few sessions after cutoff date Player with consecutive days of inactivity

S.-H. Chen / Electronic Commerce Research and Applications 17 (2016) 99–111

Refs

101

102

S.-H. Chen / Electronic Commerce Research and Applications 17 (2016) 99–111

telecommunication (Pettersson, 2004; Lemmens and Croux, 2006; Jiang et al., 2007; Samimi and Aghaie, 2008; Tsai and Lu, 2009; Huang et al., 2010; Orsenigo and Vercellis, 2010; Faris, 2014), newspaper subscriptions (Coussement and Van den Poel, 2008a,b; Coussement et al., 2010), pay-TV subscription (Burez and Van den Poel, 2007), and online gambling (Coussement and De Bock, 2013). They further include credit cards (Kumar and Ravi, 2008), banking and insurance services (Van den Poel and Lariviere, 2004), retail financial services (Glady et al., 2009), e-commerce (Yu et al., 2011), retailer (Buckinx and Van den Poel, 2005), crossselling applications (Orsenigo and Vercellis, 2010), and online games (Hadiji et al., 2014; Runge et al., 2014). However, little attention has been paid to customer churn in online dating websites. A variety of binary classification techniques have been successfully applied in customer churn prediction in a static manner using a multivariate dataset. These techniques include decision trees (DT) (Glady et al., 2009; Huang et al., 2010; Yu et al., 2011; Coussement and De Bock, 2013; Faris, 2014; Hadiji et al., 2014; Runge et al., 2014), random forests (RF) (Buckinx and Van den Poel, 2005; Burez and Van den Poel, 2007; Coussement and Van den Poel, 2008a), logistic regression (LR) (Buckinx and Van den Poel, 2005; Burez and Van den Poel, 2007; Coussement and Van den Poel, 2008a,b; Glady et al., 2009; Coussement et al., 2010; Runge et al., 2014), LR with Markov chains (MC) (Burez and Van den Poel, 2007), and proportional hazard modeling (PHM) (Van den Poel and Lariviere, 2004). They also include generalized additive models (GAM) (Coussement et al., 2010; Coussement and De Bock, 2013), artificial neural networks (ANN) (Buckinx and Van den Poel, 2005; Glady et al., 2009; Tsai and Lu, 2009; Huang et al., 2010; Yu et al., 2011; Faris, 2014; Runge et al., 2014), the support vector machines (SVM) (Coussement and Van den Poel, 2008a; Huang et al., 2010; Yu et al., 2011; Runge et al., 2014), ensemble learners (Lemmens and Croux, 2006; Coussement and De Bock, 2013), majority voting (Kumar and Ravi, 2008), cost-sensitive classifiers (Glady et al., 2009), and particle swarm optimization (PSO) (Faris, 2014). The above methods were used for customer churn prediction with static variables only. To develop time-series churn prediction methods using longitudinal variables is relatively rare but still attractive. To our knowledge, these techniques include control charts (Pettersson, 2004; Jiang et al., 2007; Samimi and Aghaie, 2008), discrete SVM (Orsenigo and Vercellis, 2010), and hidden Markov model (HMM) (Runge et al., 2014). In addition to application and prediction models, the definition of who is a churner also varies. A customer is considered to be one if she, for example, does not renew her subscription after an expiry date (Burez and Van den Poel, 2007; Coussement and Van den Poel, 2008a,b; Coussement et al., 2010), stops paying (Burez and Van den Poel, 2007), shifts to a competitor’s service (Huang et al., 2010), leaves the company (Faris, 2014), or closes her account (Van den Poel and Lariviere, 2004). These are absolute criteria for defining churners. In addition to absolute criteria, relative criteria are also commonly used in definitions of churn. This defines a churner based on her purchase frequency, visit times or some usage threshold (Buckinx and Van den Poel, 2005; Lemmens and Croux, 2006; Jiang et al., 2007; Glady et al., 2009; Tsai and Lu, 2009; Yu et al., 2011; Coussement and De Bock, 2013; Hadiji et al., 2014; Runge et al., 2014). Churn prediction has been of interest and has attracted the attention of business. Every industry comes up with a variety of variables to quantify customer behavior based on interaction experiments with customers, develops the standards for churn according to the termination of interaction, and carries out nonlinear input–output mapping for the relationship between customer behavior and churn via binary classifiers.

2.2. Cumulative sum (CUSUM) control chart Page (1954) introduced the concept of a CUSUM chart into statistical process control (SPC), which produced a tool for process and measurement control. A CUSUM chart is known to be more sensitive to small shifts in the mean of a process compared to other types of control charts; it takes the past history into account and weights recent observations. Most discussions about the advancement and theory of the CUSUM are limited to the assumption of a symmetric and bellshaped normal distribution. However, there are studies of CUSUM schemes for long-tailed, skewed distributions and the nonparametric case. The sequential probability ratio test (SPRT) is a fundamental CUSUM procedure that allows the optimal parameters to be developed. Extensive numerical studies have been performed to investigate how SPRT allows the optimal parameters of CUSUM procedures to be developed under different distributions. There are other types of non-normal CUSUM charts available too. McGilchrist and Woodyer (1975) considered a distributionfree CUSUM that does not rely on the underlying distribution of observations. Lucas (1985) described Poisson CUSUM for monitoring the mean number of counts per sampling interval. Some enhancements, including fast initial response (FIR) and robustification, have been discussed too. Gan (1993) presented optimal designs of CUSUM charts for binomial counts to detect fractional shifts. Graphs of chart parameters were created and could be used to obtain some of the key parameters. The choice of sample size has also been investigated. Gan (1994) derived an exponential CUSUM chart to monitor the rate of occurrence of events. And Hawkins and Olwell (1998) introduced a negative binomial CUSUM chart in which the number of successes is fixed, to set up a scheme for detecting fractional changes. Optimal parameters, such as decision interval (h) and sample size, were not presented in this work though. For dispersion, the sample variance is a useful CUSUM statistic, and it follows a chisquare distribution. Chang and Gan (2001) provided tabulation procedures for the optimal design of geometric and Bernoulli CUSUM charts. Methods for sensitivity analysis of CUSUM charts based on the geometric, Bernoulli and binomial distributions were also provided. Huang et al. (2013) handled the task of monitoring changes in scale, assuming the shape to be fixed. They gave graphs of chart parameters for various shapes. These CUSUM charts are easy to design and implement because tables or graphs have shown the required decision interval for the selected reference value (k) and the average run length (ARL) under different distributions. The general rule is: given k and ARL, then you can find h. Among the various CUSUM chart types, the exponential and gamma distributions are naturally suitable for monitoring timebetween-event (TBE) variables, especially IAT and recency. We chose to use the gamma CUSUM chart because the gamma distribution is a flexible probability function that may offer a good fit for IAT datasets. 2.3. A hierarchical Bayesian mixture model for time-between-event variables The growth of personal computing led to faster, cheap, consumer-level customer base analysis. Stochastic models that process consumer behavior consider different probability distributions. The most common pattern uses the Poisson or exponential distribution to simulate customer login frequency, IAT or interpurchase time (IPT), coupled with distribution of gamma family to capture customer heterogeneity. For example, Hardie et al. (1998) used an exponential distribution to model IPT. Allenby et al. (1999) also used IPT to divide consumer behavior into super-active, active and inactive states via a mixture model with

S.-H. Chen / Electronic Commerce Research and Applications 17 (2016) 99–111

a generalized gamma distribution. Venkatesan et al. (2007) proposed a joint model for purchase timing and quantity in which the purchase timing uses the Allenby et al. (1999) model to process IPT. The purpose was to separate customers into heavy and light user classes. Lo (2008) segmented customers into four groups: super-active, active, inactive and churned by modeling IAT as a gamma-inverse gamma hierarchical Bayesian mixture model. And finally, Guo (2009) built a one-to-one multi-category IPT model using the generalized gamma distribution and multiplicative model formulations. A finite mixture model is a multi-modal probability function consisting of a finite number of homogeneous or non-homogeneous probability distributions. The flexibility supports density-based fitting and segmentation for heterogeneous customer data. Further structuring and estimating of the parameters for the finite mixture model using hierarchical Bayesian analysis can more flexibly handle customer heterogeneity. The hierarchical Bayesian model treats parameters as random variables subject to another probability distribution, called the prior distribution, and builds a concept of hierarchy. The most common prior distributions are the vague prior, Jeffrey’s prior, Laplace’s prior and the natural conjugate prior. The latter is the most widely used. An interested reader should refer to the pairings of conjugate distribution summarized by Spiegelhalter et al. (1996). The estimate of multiple parameters in the hierarchical Bayesian model adopts the Gibbs sampler, a Markov chain Monte Carlo (MCMC) algorithm. The method estimates the unknown parameters by simulating the posterior distributions of the parameters. The principle is that, except for the single parameter that is to be repeatedly sampled, the other parameters and the observed data need to be given in advance. (See Rossi and Allenby, 2003 for additional details.) The process of Gibbs sampling can be achieved using the freeware WinBUGS for Windows Bayesian Inference Using Gibbs Sampling. This study continues the trend of applying a stochastic model that handles IAT or IPT, as in the past. We propose a hierarchical Bayesian mixture model that fits the IAT of each customer using active and inactive gamma distributions. Then, we apply an inverse gamma distribution to capture customer heterogeneity. This enables us to assess active and inactive customer login behavior in the past that ties in with our use of a gamma CUSUM chart.

3. Proposed methodology In this section, the development of the proposed methodology is described step by step. The fundamental base model is presented first. Then we talk about the processes of parameter estimation. The rationale for introducing the recency variable into proposed model is then discussed. Finally, the proposed method is summarized. As mentioned earlier, the state-of-the-art, as described in the literature, consists mostly of static data analyses. The results can only be presented in tables, and their design involves too many variables, which limits their practicality. In our experience, feature extraction is an interesting but computationally expensive process. Too many variables tend to cause problems, such as intercorrelation, multicollinearity, over-fitting, over-parameterization, or even spurious regression effects. Furthermore, the model parameters in previous research are estimated from a group of customers, ignoring the fact of customer heterogeneity. Unlike the prior research studies, this study conducts a longitudinal analysis to handle churn prediction at the individual level. A longitudinal analysis has the advantage over a static analysis when the data collection process provides more detailed insight into behaviors at the individual level (Yee and Niemeier, 1996). The control chart is different from

103

the previous static churn prediction methods because it achieves the goals of longitudinal analysis and visualization monitoring for each customer. The gamma CUSUM chart is chosen for three reasons. Firstly, most time-series methods are derived using the assumption of normally distributed residuals which is obviously not suitable for time interval data. Secondly, the shape of the gamma distribution is very flexible and can provide good fit for time interval data of individuals. Even though the proposed method may not improve on the predictive accuracy achieved in earlier models, it can visualize longitudinal data and achieve a certain level of predictive accuracy with just a small number of variables. 3.1. The base model The IAT and IPT methods have often been used to describe customer behavior (Allenby et al., 1999; Boatwright et al., 2003; Venkatesan et al., 2007; Fok et al., 2012). Analyzing customer sequences for IAT or IPT can help companies to obtain insight into their behavior. For example, variation in their login time intervals is related to the value of customers (Blattberg et al., 2001). In general, customers with more frequent logins can create more cash flow and have a longer life cycle than less active customers. In entertainment dating websites, for example, login frequency is highly related to website business, and is a key factor contributing to revenue for member spending and advertising revenue. These indirectly affect website visibility. We focus on monitoring the timing of changes in customers’ IAT. IAT is the time interval between two logins. We assume that IAT follows a gamma distribution:

t  Gðv ; hÞ

ð1Þ

in which t is IAT, v is the shape parameter, and h is the scale parameter. In addition, the parameters make the probability density function more flexible. The values of IAT are positive, which meets the requirements that the gamma variables are larger than 0. Smaller values of IAT indicate that the customer visits the site more frequently, and larger values mean the visits are not as frequent. The length of time during the IAT process is called the time frame (tf). A smaller value indicates higher resolution, and vice versa. The measurement of tf depends on sensitivity analysis. The gamma variable needs to be larger than 0, so we treat multiple logins in the same time frame as a single login, so IAT will not be equal to zero. To achieve visual monitoring of customer login, we use a gamma CUSUM chart that can monitor TBE-type variables. Since we monitor upward shifts of IAT, we will build a one-sided gamma CUSUM chart with an upper control limit:

zi ¼ max f0; ðt i  kÞ þ zi1 g

ð2Þ

In Eq. (2), ti represents the IAT with accumulated v at time i, zi is the corresponding gamma CUSUM score, k is the reference value, and z0 is the non-negative head-start. When zi > h, the system issues an out-of-control warning, indicating the IAT of the customer has apparently exceeded the active behavior in the past, and h is the decision interval. However, Eq. (2) does not work here, as there are problems with the parameter estimation and the properties of the variables that still need to be resolved. 3.2. How to estimate the parameters k and h? As shown in Eq. (2), the choice of k and h is the fundamental step to design a gamma CUSUM chart. Fixing v and then the distribution of the alternative hypothesis H1: h = h1 with that of null hypothesis H0: h = h0, k can be derived as a log-likelihood ratio:

104



S.-H. Chen / Electronic Commerce Research and Applications 17 (2016) 99–111

mðln h1  ln h0 Þ 1 h1 0  h1

ð3Þ

Eq. (3) for k is a function of a fixed shape parameter, v, an in-control scale parameter, h0, and an out-of-control scale parameter, h1. Also, h is a function of v, h1/h0 and ARL and it can be solved by a piecewise collocation method. Of these variables, v and h1/h0 are obtained from Eq. (3), whereas ARL is specified in terms of the reciprocal of the significance level. Given v, h1/h0 and ARL, h can be computed as in Huang et al. (2013). They set the scale parameter to 1 throughout, dividing the gamma random variable by h0 to facilitate the design of gamma CUSUM chart. In this manner, the gamma CUSUM chart with (k, h) under G(v, h0) is similar to that designed with (k/h0, h/h0) under G(v, 1). 3.3. How to estimate v, h0 and h1? Based on the stochastic model for IAT, we use the gamma distribution for customer login behavior and the inverse gamma distribution to capture customer heterogeneity. Given v, a hierarchical Bayesian with a two-component mixture model will be used to estimates the scale parameters, h0 and h1, for active and inactive customer login behavior, as follows:

gðtÞ ¼

1 X qd  Gðm; hd Þ

ð4Þ

d¼0

Here, g() is called finite mixture density function, qd is the mixing proportion and q0 + q1 = 1. G(v, hd) represents the component density formed by the dth gamma distribution with h1 > h0 > 0. We set v for the two gamma distributions of every customer based on sensitivity analysis. The practical meaning of the parameter is that it suggests the value of IAT associated with the value v for the login. If each customer has a unique v based on past behavior, the model can be more flexible and closer to producing a customized prediction for customer churn. Looking at each individual customer, dynamic values of v may make the two gamma distributions for calculating IAT inconsistent. In terms of model performance, such values of v can make comparing the difference between the TBE control charts difficult. Our experimental analysis will show that model performance with dynamic values of v is worse than with static values, due to the complexity and slower computation. A finite mixture model provides a flexible multi-modal density function but cannot cope with the changing of individual customer behavior. To expand the process scope of the model’s estimation capacity, this study uses a hierarchical Bayesian model proposed by Allenby et al. (1999) and employs the prior distribution setting of Spiegelhalter et al. (1996) to estimate the parameters of Eq. (4):

hd  IGðsd ; kd Þ qd  Dirichietðpd Þ

sd  Uð0; 1Þ

ð5Þ

kd  IGðad ; bd Þ The first-level hd is used in the prior distribution of Eq. (4) using the inverse gamma distribution. The parameters are sd and kd. Also, qd uses a Dirichlet distribution to determine the proportion, pd, of each component density. As suggested by Carlin and Chib (1995), the initial value of pd is equal to 0.5. The second-level sd uses the uniform distribution of positive real numbers within the same range as the prior distribution. kd uses the inverse gamma distribution with parameters ad and bd as in the prior distribution. The third-level parameters, ad and bd, have default values of 5. As long as there is enough input data, the impact of these parameters on the estimate of the posterior distribution will be very small (Allenby et al., 1999).

The hierarchical Bayesian mixture model in Eqs. (4) and (5) can be used to divide the IAT of customers into two clusters with gamma distributions. All parameters in the equations can be used to perform Bayesian estimate using the Gibbs sampler in WinBUGS. After the parameter estimates are completed, the component density of the smaller shape parameter, h0, can be used to determine the profiles of active behavior. Smaller IAT in the clusters indicate that the customers visit the site more frequently, while h1 indicates inactive customers. 3.4. Improving the model – IAT is not enough Typical control charts use a time-series as the horizontal axis, and the horizontal axis of a TBE control chart is a series of event occurrences. In this study, a new arrival occurs and the gamma CUSUM draws a new CUSUM score only when customers login to the website after a total of v times. With this restriction, relying on IAT to monitor whether customers are active can result in changes in behavior not being detected, especially when customers suddenly stop visiting. This is because the control charts have no new incoming data for the login records and so there is no way to determine the churn status. To compensate for this drawback, we introduce the recency-oflast-arrival rule proposed by Wübben and Wangenheim (2008). After recency is added in, every cut-off time at least has some recency, even if there are no new login records. Recency in the time-series can be integrated into TBE-type control charts for the number of events, so Eq. (2) is modified:

zði;



 ¼ max 0; ðtði;



 kÞ þ zði1;

 ni Þ

ð6Þ

Here, t(i, j) represents the time interval for item j with arrival number i. An arrival number means the login records have been accumulated v times. z(i, j) represents the gamma CUSUM score corresponding to t(i, j), and i = 1 2, . . ., j = 1, 2, . . ., ni and ni is the number of items for the ith arrival number. There may be only a single IAT (ni = 1), or perhaps multiple recency paired with a IAT (ni > 1) within an arrival number. If j < ni, then the customer did not accumulative the minimum number of visits, v times, by the cut-off time, and t(i, j) at that point is the value of recency. If j = ni, then the customer has accumulated sufficient visits, v times, at the cut-off time, and t(i, j) at that point is IAT. Also, z(0, 1) is a nonnegative head-start, and it is usually set to zero. t(1, 1) is the initial observed value, which is obtained by calculating the time difference between the cut-off time at the cumulative login, v times, from the data for monitoring and the last login in the data for modeling. A customer cannot visit a website all the time, so most CUSUM scores on control charts are supplemented by recency. With updates at the cut-off time, t(i, j) of Eq. (6) operates in two ways:  Recency-substitution rule. If a customer does not visit the site v times, the value of old recency is replaced by the value of the new one.  IAT-substitution rule. If a customer accumulates v visits by the cut-off time, recency is ineffective and will be replaced by the newest value of IAT. In the light of these rules, we can see that recency must be greater than or equal to v. 3.5. Summarizing the steps the customized gamma CUSUM chart Fig. 1 shows the detailed process of our research methodology. Initially, mature customers are divided into training and testing datasets. Next, the best parameters are determined through a

S.-H. Chen / Electronic Commerce Research and Applications 17 (2016) 99–111

105

Fig. 1. Flowchart of the research methodology.

sensitivity analysis during the training process. Subsequently, the best parameters are used in the actual evaluations of the testing dataset. The above process is carried out one hundred times, iter = 100, to ensure the robustness of the process against choice of parameters. The execution procedure is as follows:  Step 1. Screen the customers. Screen out the customers that have not been included in the database for more than six months.  Step 2. Segment the data. Divide the mature customers into two parts. Training data are used to perform sensitivity analysis to obtain the best model parameters. Testing data are used to test prediction accuracy and model efficiency.  Step 3. Set the parameters. When the model is in the training stage, it will find a better f and the best combination of different tf and v. When the model is in the testing stage, it will use the best parameters as its input. The ARL will be set to 400 which means there will be a significance level of 2.5%.  Step 4. Subdivide the dataset. Further split the training and testing datasets into three parts. The modeling data is used to calculate IAT to estimate h0 and h1. The monitoring data is used to calculate IAT and recency, so they can be used for the gamma CUSUM chart to longitudinally monitor customer status. The state data is used to determine whether there is customer churn, and the results are compared to the prediction of the gamma CUSUM chart.  Step 5. Estimate h0 and h1. Enter the IAT of the modeling data, and estimate the parameters of the hierarchical Bayesian mixture model from Eqs. (4) and (5) using WinBUGS.

 Step 6. Calculate k and h. Given v, h0 and h1, use Eq. (3) to calculate k for each customer and obtain the corresponding unified reference value through the k/h0 process. Next, given v, h1/h0 and ARL, find h/h0 from Huang et al. (2013).  Step 7. Draw the gamma CUSUM chart. Enter IAT and recency of the monitoring data, let the time roll and draw the newest CUSUM scores on the chart at every cut-off time using Eq. (6). Follow the recency-substitute rule and the IAT-substitute rule, and if the customer happens to visit the site visit v times, draw a CUSUM score for IAT; otherwise draw a CUSUM score for recency.  Step 8. Evaluate the performance. Compare the churn defined by the status data before the analysis and the predicted result from gamma CUSUM chart to verify the accuracy rate (ACC), true positive rate (TPR) and false positive rate (FPR) of the model. Then also calculate the expected number of time periods to obtain the first signal from the gamma CUSUM chart to measure the average time to signal (ATS) of the model. 4. Experimental results In this section, the customer transaction data from an entertainment dating website in Taiwan are analyzed. In addition to describing the website, we describe how the data will be preprocessed. Then several preliminary sensitivity analyses are conducted on training samples to evaluate the impact of different values of these parameters. Finally, testing samples are used to demonstrate the predictive power of the proposed method. The proposed method is also compared with the other TBE-based methods to demonstrate its effectiveness.

106

S.-H. Chen / Electronic Commerce Research and Applications 17 (2016) 99–111

4.1. Data description and preprocessing On the entertainment dating website, customers buy clothes and accessories to dress up their virtual images, and they also interact with each other by sharing photos, texting and gifts. The site relies on revenue from virtual goods and advertising to keep operating. When customers visit the website, the system records their login, logout, posting, membership point purchases, shopping, gift-giving behavior, and so forth. There are more than 69,000 customers and 800,000 login records in the original dataset. After discounting customers whose visit duration lasted less than six months, 1150 effective customers and 240,000 login records are retrieved. Those effective customers account for only 1.67 percent of the total customers but they are regarded as the premier consumers because they provide 45 percent of the revenue of the firm. We obtained 100 random samples to divide the database into training and testing datasets. During the modeling process, a customized CUSUM mechanism is designed based on the customer’s IAT from the previous six months, which are the modeling data. IAT and recency for the subsequent two months are used to predict churn, which are the monitoring data. After that, the criteria defining customer churn will differ as the domain changes. A decision is made to label customers with fewer than f logins during the 8–12 months after registration with the status of ‘‘liable to churn”. This is based on a suggestion by Hadiji et al. (2014). It is a relative criterion to define churn. If the value of f is set to 0, we consider a customer without a single login during this period to be a churner. Hadiji et al. (2014) argued that this setting only supports harsh decisions on churners and is not useful for real-world applications. Without the login records being available, the chances of reactivating a customer will be low. To relax the churn definition, the value of f should be set larger than 0. We can label customers as churners that have a low number of login times during the period, and this represents soft churn. The value of f can be determined by sensitivity analysis as in Section 4.2, allowing for a more practical model (so it is easier understand how to use the customized control chart, we use one anonymous customer as an example to record the process in detail, as discussed in the Appendix A). 4.2. Sensitivity analysis The model we have proposed has a few parameters to be set: frequency threshold (f), time frame (tf) and the shape parameter (v). There is no rule of thumb to set these parameters, so we conducted a sensitivity analysis of 575 training data observations to determine them. The f value considers the customers who login fewer than f times during the 8–12 months after registration to be churners. In general, the lower f is, the stricter the criterion to determine whether the customer is a churner. Logging in is easy and customers tend to visit a website a few times before they churn. If f is set too low, the churn proportion

Fig. 2. Relationships between avg. customer churn rate, avg. change rate and f.

may be too low. We will use f = {0, 1, . . ., 50}. Also, tf represents the IAT resolution. The time frame is to be set between one hour and one day (or 24 h) so that if tf is measured in hours, tf = {1, 2, 3, 4, 6, 8, 12, 24}. An hour is the lower limit and a day is the upper limit to calculate IAT. Also, v in Eq. (2) identifies the time interval that the gamma CUSUM chart monitors the customer login v times. We set v = {1, 2, . . ., 8}. When v = 1, the gamma distribution degenerates into an exponential distribution, and will be an Erlang distribution since we set v to be integer. 4.2.1. Determining the f value Value of the frequency threshold, f, can impact whether customers are seen to be churning or not. This will affect the prediction performance, so it needs to be determined on the basis of three parameters. We set customer churn to f = {0, 1, . . ., 50}. As shown by the blue dotted line in Fig. 2, the overall average proportion of customers determined to churn goes up with the increase of f, from 16.33% to 78.11%. The red solid line shows that the change of average churn rate decreases exponentially, indicating the gradual slowing down of the change magnitude. To determine the value of f for which it slows down, we need to use data. We introduce Rosin thresholding to solve this problem. It was first used for the binary processing of images (Rosin, 2001). It can automatically find the corner point where the magnitude slows down on the curve from the exponential shape (Perng and Chen, 2011). The corner point of the curve found by using Rosin thresholding is 4, the red dot in Fig. 2. It indicates that customers who have less than 4 logins during the 8 to 12 months after registration are considered to be churners. 4.2.2. Determining tf and v Next, we ran the Gibbs sampler 1000 times in WinBUGS using the first 6 months of customer IAT values. The first 100 are burnin iterations and the subsequent 900 are used to estimate each customer’s h0 and h1. When ARL = 400, we can determine k and h for every customer’s gamma CUSUM chart. Setting h for each customer is a manual process (Huang et al., 2013). In our implementation, v, h0 and h1 are inputs and h is the output, and the operation of the program involves a curve-fitting program to obtain them. We fixed f = 4 and observed the resulting contour maps with average values of ACC, TPR and FPR for various combinations of tf and v in Fig. 3. As tf or v increases, three scenarios obtain. (1) In some parts of tf where v = 1 or 2, average TPR and FPR appear to be decreasing. This means that it is worthwhile to take further steps to compare the prediction performance by comparing the exponential and gamma CUSUM charts. The comparison result is discussed in Section 4.3. (2) Other than the combination mentioned in the first scenario, the average TPR and FPR show a decreasing trend. This indicates that when the resolution for calculating IAT is higher or there is less monitoring of IAT, and the gamma CUSUM chart becomes too strict, most of the customers will be considered to be churners. However, if the gamma CUSUM chart is not strict enough, then the customers will be more likely to be viewed as non-churners. (3) The maximum value of average ACC is at 79.29% for (tf, v) = (8, 2). We also report a receiver operating characteristic (ROC) space in Fig. 4, based on the axes of the average FPR and TPR values. The perfect prediction is the point on the top left with the coordinates (0, 1), meaning there is no FPR and the TPR are all positive. Random guessing is located at the point on the diagonal line between (0, 0) and (1, 1). The points on the left or right of the diagonal line mean that the predicted results are better or worse than random guessing. For the indicator that shows the prediction based on the distance between a point and the perfect prediction, the closer the distance the higher the accuracy, as expressed in the equation:

S.-H. Chen / Electronic Commerce Research and Applications 17 (2016) 99–111

107

Fig. 3. Contour maps showing different combinations of tf and v: (a) average ACC, (b) average TPR and (c) average FPR.

Fig. 5. Scatter plot of estimated h0 and h1 for customer.

Fig. 4. ROC space under different combinations of tf and v.

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dist ¼ ðFPR  0Þ2 þ ðTPR  1Þ2

ð7Þ

Fig. 4 shows the point with the smallest distance: (20.51%, 79.16%). The corresponding (tf, v) is (8, 2). This suggests calculating IAT with 8 h as the time frame, and the gamma CUSUM chart shows that it is appropriate to monitor IAT twice. 4.3. Predictive power and comparative analysis of models Sensitivity analysis of the training dataset determined that (f, tf, v) = (4, 8, 2). We tested the model’s performance using 100 test datasets based on this parameter combination. The experiment performed 1000 Gibbs sampling rounds to estimate the h0 and h1 to determine the i, k and h given ARL = 400. As shown in Fig. 5, different customers have different past behaviors. This heterogeneity makes some customers have higher tolerance for k, as a result of which CUSUM is only active when IAT or recency in some periods are too big. For some customers, the tolerance of k is low but h will be large, so the time interval will be larger. These kinds of phenomena are due to differences in past customer logins.

We used a real-world database to test our proposed method, and performed a comparative analysis. In the second column of Table 2, we see that the average ACC = 80.52% from the gamma CUSUM chart has an acceptable level of predictive power when the recency variable is considered. The average value of TPR = 88.11% suggests the average percentage of missed churn detection is about 11.89%. The reason is usually that h0 and h1 are too close, resulting in a poor value of h, so the prediction is insensitive. The average FPR = 23.08% indicates the portion of false alarms, meaning that non-churners are treated as churners. The reason is usually that customers may suddenly visit again after disappearing for a while during the monitoring period. They are not totally gone, but for some reason they have not logged onto the website of late. The average ATS = 30.45 indicates that the mechanism spends about one month on average to capture and make sense of the deterioration in a customer’s behavior. In contrast, if we use the same samples to conduct the test without introducing recency, we can demonstrate the influence of that variable. The average ACC of the gamma CUSUM chart falls by about 6% and the average ATS increases by about 9 days. So when there is not enough recency, accuracy and efficiency become

108

S.-H. Chen / Electronic Commerce Research and Applications 17 (2016) 99–111

Table 2 Summary of the comparisons between the gamma CUSUM chart, exponential CUSUM chart, CQC-v and CQC results with and without recency. Methods items

Gamma CUSUM chart

Exponential CUSUM chart

CQC-v

CQC

Parameters (tf, v, f)

(8, 2, 4)

(12, 1, 4)

(8, 2, 4)

(2, 1, 4)

Avg. ACC (%)

With recency No recency

80.52

71.30

73.74

75.30

74.26

66.43

67.13

69.04

Avg. ATS (days)

With recency No recency

30.45

36.31

28.05

29.59

39.38

45.48

29.61

32.58

Avg. TPR (%)

With recency No recency

88.11

67.57

73.51

52.97

50.81

43.24

22.16

15.68

Avg. FPR (%)

With recency No recency

23.08

26.74

26.34

14.32

14.62

22.37

11.76

4.35

average FPR, CQC offers the fewest false alarms, followed by the gamma CUSUM chart. The average FPR for all of the methods is affected by recency, falling by 6%. The above comparisons suggest that, for the data used in this experimental trial, the gamma CUSUM chart method performs better than the other control charts and when recency is taken into account it even performs better than the TBE chart. We also learn that it is appropriate to trade off between ACC and ATS when selecting results for the TBE chart method. 5. Conclusion

worse. Also, the average TPR falls by about 37%. This shows that there are some customers who suddenly do not visit anymore. In this circumstance, with insufficient recency, the method will not be able to produce a new score. Also, it will mistakenly assume that the customers are still active though they may no longer be. Finally, we observe that the average FPR value rose by about 9%. This is the negative influence of adding recency. It makes the detection more sensitive, but this resulted in more false alarms. We note that it is worth trading off FPR for TPR, because, after all, the company pays a higher price if it loses customers because it fails to identify churners (false negative), while little is lost by treating a good customer as a possible churner (FPR). We also compared the differences in performance of the exponential CUSUM (Gan, 1994), CQC-v (Xie et al., 2002), CQC (Chan et al., 2000) and gamma CUSUM approaches. The dataset, ARL, and the estimation of h0 and h1 for the experiment are the same, and f is fixed at 4. We also performed sensitivity analysis, so the various TBE control mechanisms performed their best. The selected parameter combination is in the second row of Table 2. k and h for the exponential CUSUM chart were obtained in a similar way to the values for the gamma CUSUM chart, where estimation of h0 and h1 is the starting point. The only difference is that v = 1 in the latter, so that we only need to perform the experiment for tf when doing the sensitivity analysis for the exponential CUSUM chart. The CQC-v and CQC are the cumulative quantity control charts based on the gamma and exponential distributions. We only need to obtain h and h for when the former is equal to the weighted average of h0 and h1 and the latter is derived by solving the cumulative gamma distribution equation under ARL = 400. Again, CQC is the special case of CQC-v where v = 1, so that when we conduct sensitivity analysis, CQC-v performs the experiment for tf and v while CQC performs the experiment only for tf. Based on the results of the comparative analysis in Table 2, and irrespective of whether recency is included, the results for the proposed method are favorable. For average ACC, the gamma CUSUM chart offers the highest value. Without recency, the average ACC of each method falls by more than 6%. With average ATS, the churn warning detected by CQC occurs sooner than that of CUSUM. Although the efficiencies for the gamma CUSUM chart, CQC-v and CQC are not very different, CQC-v gives the warning more than two days earlier on average. When recency is not considered, the detection speed from the CUSUM chart is slowed down by more than one week. When the average TPR is considered, the gamma CUSUM chart has the lowest number of false negatives. The average TPR for all the methods benefits from recency and increases by 33%. For

Virtual shops are booming today because of the diversification and convenience of products and services, resulting in generally higher customer churn rate for online firms compared to physical stores. It is also more difficult to maintain stable relationships with online customers. Most of the existing studies related to the prediction of customer churn are static analyses and they are not well suited to conducting individual and dynamic monitoring. This study uses the gamma CUSUM chart to build a very different prediction mechanism. The gamma CUSUM chart monitors individual customer’s IAT by introducing a finite mixture model to design the reference value and decision interval of the chart and using a hierarchical Bayesian model to capture the heterogeneity of customers. Recency, another time interval variable that is complementary to IAT, is integrated into the model and tracks the recent status of the login behavior. In addition, benefiting from the basic nature of control charts, the graphical interface for each customer is an additional advantage of the method proposed here. A real-world database of an entertainment dating website in Taiwan is used to test the proposed method. It finds that the average ACC of the gamma CUSUM chart is about 80.52% when recency is considered. The corresponding average TPR can be as high as 88.11%, which is an impressive achievement. 88.11% is considered to be an acceptable missed detection rate because only a small portion of churners will not be identified even if we employ a univariate mechanism. A comparative analysis of different TBE control charts is also presented in which each has its own optimal parameters determined in a sensitivity analysis. The results of comparative analysis show that the average ACC for a gamma CUSUM chart is 5.2% higher and the ATS adds two days to the best CQC-v. The results of this comparison also indicate that although adding recency can result in an increase in false positives, the lower missed detection rate improves ACC and the added early warning feature also reduces ATS. This phenomenon occurs not only in gamma CUSUM chart, and the exponential CUSUM chart, CQC-v and CQC also receive benefit. As recency is just a special type of IAT, the proposed mechanism is essentially a univariate analysis technique. Univariate analysis may limit the prediction accuracy, to some extent. In the future, we encourage future studies to try to add other variables of consumer behavior as the modeling basis for a multivariate control chart, possibly improving the prediction accuracy for customer churn. In addition, it would also be possible for future studies to introduce a one-sided gamma CUSUM chart with a lower control limit to detect the downward shift of IAT. Finding the potential improvement for login behavior may help administrators research the ways in which they can encourage more active behavior. Acknowledgement The author would particularly like to thank Professor Rob Kauffman, the Editor, Professor Chris Yang, the Co-editor, and the anonymous reviewers for constructive and helpful comments.

109

S.-H. Chen / Electronic Commerce Research and Applications 17 (2016) 99–111

Appendix A Given the best parameters (tf, v, f) = (8, 2, 4) and ARL = 400, we use an anonymous customer to show the process of the proposed method. The first visit to the website was on February 4th, and the number of visits between October 4th of the same year and February of the next year was less than 4 times so that the customer is classified as a churner. The IAT between February 4th and August 4th is entered as modeling data into WinBUGS to perform Gibbs Sampling, and the individual Bayesian estimates for the customer are shown in Table A1. The iteration records of

Table A1 Summary of the two-component gamma mixture model for IAT. Parameter

Mean

Standard deviation

2.50%

Median

97.50%

h0 h1 q0 q1

1.520 6.260 0.537 0.463 3.955 3.087 2.637 7.937 3.336 2.201 3.306 4.063

0.258 0.830 0.082 0.082 2.976 2.404 2.556 8.249 2.700 2.119 2.992 3.121

1.130 5.195 0.377 0.230 0.213 0.261 0.004 0.230 0.197 0.118 0.005 0.008

1.558 6.459 0.539 0.463 3.405 2.363 1.900 4.708 2.407 1.503 2.715 3.572

2.128 8.295 0.703 0.623 9.610 8.506 8.719 29.60 9.576 8.250 9.755 9.387

s0 s1

k0 k1 a0 a1 b0 b1

Fig. A1. Estimation process for Gibbs sampling on h0 and h1.

h0 and h1 are shown in Fig. A1, in which the two parameters fluctuate between 1 and 2.5, and 4 and 12, respectively. The fitting results from the modeling data of the finite mixture model are shown in Fig. A2. We then set the hypotheses H0: h = 1.52 v.s. H1: h = 6.26 and follow Eq. (3) to obtain k = 5.69. We also derive h/h0 = 6.17 and h = 9.38, given v = 2, h1/h0 = 4.12 and ARL = 400. So the customized control mechanism is:

zði;



 ¼ max 0; ðt ði;



 5:69Þ þ zði1;

ni Þ



ð8Þ

In which, i = 1, 2, . . ., 25, j = 1, 2, . . ., ni, z(0, 1) = 0 and t(1, 1) = 10. When z(i, j) > 9.38, the system generates warnings. The customer at the beginning was very keen on the website, so the average IAT from the modeling data was about 1.27 days. The values of h0 and h1 estimated by Gibbs Sampling were smaller and not much different; k = 5.69 means that the mechanism can tolerate about two days of absence. Next, we enter the testing data from August 4th to October 3rd and let the time roll. The gamma CUSUM chart of the customer is shown in Fig. A3. The horizontal axis is the arrival number for accumulative visits of two times, while the vertical axis is the gamma CUSUM score. The dotted line represents h, and the solid points are the gamma CUSUM scores formed by IAT, indicating the cutoff time when the customer accumulates two visits. The hollow points are the gamma CUSUM scores formed by recency, indicating that the customer has not accumulated two visits by the cut-off time. As tf is given as 8, we mark labels of (1), (2) and (3) to represent the periods of 00:00–07:59, 08:00–15:59 and 16:00–23:59 of a day in Fig. A3. In addition to the issues raised in Section 3.4, where recency has to comply with the recency-substitute-rule and IAT-substituterule, and example of how this operates can be seen at arrivals number four and five in Fig. A3. After accumulating two visits on 8/12 (2), the customer does not accumulate two more visits until 8/14 (2). As the time passed after 8/12(2), the recency of 8/13(1) replaced the IAT of 8/12(2) and then the recency of 8/13(2) also replaced that of 8/13(1). The recency-substitute-rule would continuously operate during each new cut-off time until the customer accumulated two visits on 8/14(2). The IAT-substitute-rule was also enabled, and arrival number five was activated and the IAT of 8/14(2) replaced the recency of 8/14(1) (Please note that we did not draw any gamma CUSUM score on 8/12(3) as v is set at two.) The 8/12(3) was only 1 time interval away from the previous event at 8/12(2). Solid or hollow points were not formed,

Fig. A2. Fitting result for the hierarchical bayesian mixture model.

110

S.-H. Chen / Electronic Commerce Research and Applications 17 (2016) 99–111

Fig. A3. Gamma CUSUM chart of the anonymous customer.

irrespective of whether the customer visited or not. This is also why the recency or IAT proposed in Section 3.4 must be greater than v. In the design of this study, the system draws a gamma CUSUM score at every cut-off time. Before the customer accumulated two visits to the website, the gamma CUSUM score would continuously accumulate at the same arrival number. The old points would be replaced constantly by the new points, and the system would issue warnings for deterioration if the solid or hollow points exceeded h. The testing data find that the customer has the active login behavior before the end of August. However, the status is relatively inactive starting from September as shown in Fig. A3. Before 8/30 (3), gamma CUSUM scores fluctuate from 0 to 9, but never exceed the limit of h = 9.38. After 8/30(3), gamma CUSUM score starts to accumulate and the behavior warning appears on 9/5(1). To show the importance of recency, we imagine the consequence of not having the pseudo points in Fig. A3 and realize that we are no longer able to continuously monitor the customer behavior and the deterioration is not detected until 9/15(2) instead of 9/5(1) as it should be. This is also why ATS of the mechanism accelerates when there is recency in Table 2. If we take a step further and ignore the login data after 9/15(2) and only keep the solid points before 9/9(2), we find that the customer no longer visits after leaving the history data within the control mechanism. In this circumstance, we will never be able to draw new information as we lack recency and mistakenly assume that the customer is still in active. This is also why TPR of the mechanism increases when there is recency, as can be seen in Table 2.

References Allenby, G.M., Leone, R.P., Jen, L., 1999. A dynamic model of purchase timing with application to direct marketing. J. Am. Stat. Assoc. 94 (446), 365–374. http://dx. doi.org/10.1080/01621459.1999.10474127. Berson, A., Smith, S.J., 1999. Building Data Mining Applications for CRM. McGrawHill, New York, NY. Blattberg, R.C., Getz, G., Thomas, J.S., 2001. Customer Equity: Building and Managing Relationships as Valuable Assets. Harvard Business Press, Boston, MA. Boatwright, P., Borle, S., Kadane, J.B., 2003. A model of the joint distribution of purchase quantity and timing. J. Am. Stat. Assoc. 98 (463), 564–572. Buckinx, W., Van den Poel, D., 2005. Customer base analysis: partial defection of behaviorally loyal clients in a non-contractual FMCG retail setting. Eur. J. Oper. Res. 164 (1), 252–268. Burez, J., Van den Poel, D., 2007. CRM at a pay-TV company: using analytical models to reduce customer attrition by targeted marketing for subscription services. Expert Syst. Appl. 32 (2), 277–288. Carlin, B.P., Chib, S., 1995. Bayesian model choice via Markov chain Monte Carlo methods. J. Roy. Stat. Soc.: Ser. B (Methodol.) 57 (3), 473–484. Chan, L.Y., Xie, M., Goh, T.N., 2000. Cumulative quantity control charts for monitoring production processes. Int. J. Prod. Res. 38 (2), 397–408. Chang, T.C., Gan, F.F., 2001. Cumulative sum charts for high yield processes. Stat. Sin. 11 (3), 791–806.

Chen, Z.Y., Fan, Z.P., Sun, M., 2012. A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data. Eur. J. Oper. Res. 223 (2), 461–472. Coussement, K., De Bock, K.W., 2013. Customer churn prediction in the online gambling industry: the beneficial effect of ensemble learning. J. Bus. Res. 66 (9), 1629–1636. Coussement, K., Van den Poel, D., 2008a. Churn prediction in subscription services: an application of support vector machines while comparing two parameterselection techniques. Expert Syst. Appl. 34 (1), 313–327. Coussement, K., Van den Poel, D., 2008b. Integrating the voice of customers through call center emails into a decision support system for churn prediction. Inf. Manag. 45 (3), 164–174. Coussement, K., Benoit, D.F., Van den Poel, D., 2010. Improved marketing decision making in a customer churn prediction context using generalized additive models. Expert Syst. Appl. 37 (3), 2132–2143. Coyles, S., Gokey, T.C., 2005. Customer retention is not enough. J. Consum. Mark. 22 (2), 101–105. Faris, H., 2014. Neighborhood cleaning rules and particle swarm optimization for predicting customer churn behavior in telecom industry. Int. J. Adv. Sci. Technol. 68, 11–22. Fok, D., Paap, R., Franses, P.H., 2012. Modeling dynamic effects of promotion on interpurchase times. Comput. Stat. Data Anal. 56 (11), 3055–3069. Gan, F.F., 1993. An optimal design of CUSUM control charts for binomial counts. J. Appl. Stat. 20 (4), 445–460. Gan, F.F., 1994. Design of optimal exponential CUSUM control charts. J. Qual. Technol. 26 (2), 109–124. Glady, N., Baesens, B., Croux, C., 2009. Modeling churn using customer lifetime value. Eur. J. Oper. Res. 197 (1), 402–411. Guo, R.S., 2009. A multi-category inter-purchase time model based on hierarchical Bayesian theory. Expert Syst. Appl. 36 (3), 6301–6308. Hadiji, F., Sifa, R., Drachen, A., Thurau, C., Kersting, K., Bauckhage, C., 2014. Predicting player churn in the wild. In: Proc. IEEE Conference on Computational Intelligence and Games. IEEE Comp. Soc. Press, Washington, DC. Hardie, B.G., Fader, P.S., Wisniewski, M., 1998. An empirical comparison of new product trial forecasting models. J. Forecast. 17 (34), 209–229. Hawkins, D.M., Olwell, D.H., 1998. Cumulative Sum Charts and Charting for Quality Improvement. Springer Science and Business Media, Berlin, Germany. Huang, B.Q., Kechadi, T.M., Buckley, B., Kiernan, G., Keogh, E., Rashid, T., 2010. A new feature set with new window techniques for customer churn prediction in landline telecommunications. Expert Syst. Appl. 37 (5), 3657–3665. Huang, W., Shu, L., Jiang, W., Tsui, K.L., 2013. Evaluation of run-length distribution for CUSUM charts under gamma distributions. IIE Trans. 45 (9), 981–994. Jiang, W., Au, T., Tsui, K.L., 2007. A statistical process control approach to business activity monitoring. IIE Trans. 39 (3), 235–249. Kotler, P., 2001. Marketing Management: Analysis, Planning, Implementation, and Control. Prentice Hall, Englewood Cliffs, NJ. Kumar, A.D., Ravi, V., 2008. Predicting credit card customer churn in banks using data mining. Int. J. Data Anal. Tech. Strateg. 1 (1), 4–28. Lemmens, A., Croux, C., 2006. Bagging and boosting classification trees to predict churn. J. Mark. Res. 43 (2), 276–286. Lo, S.C., 2008. Online customer identification based on Bayesian model of interpurchase times and recency. Int. J. Syst. Sci. 39 (8), 853–863. Lucas, J.M., 1985. Counted data CUSUM’s. Technometrics 27 (2), 129–144. McGilchrist, C.A., Woodyer, K.D., 1975. Note on a distribution-free CUSUM technique. Technometrics 17 (3), 321–325. O’Brien, L., Jones, C., 1995. Do rewards really create loyalty? Long Range Plan. 28 (4), 130–130. Oliver, R.L., 2010. Satisfaction: A Behavioral Perspective on the Consumer. ME Sharpe, Armonk, NY, p. 2010. Orsenigo, C., Vercellis, C., 2010. Combining discrete SVM and fixed cardinality warping distances for multivariate time series classification. Pattern Recogn. 43 (11), 3787–3794.

S.-H. Chen / Electronic Commerce Research and Applications 17 (2016) 99–111 Page, E.S., 1954. Continuous inspection schemes. Biometrika 41 (1/2), 100–115. Peng, J., Quan, J., Zhang, S., 2013. Mobile phone customer retention strategies and Chinese e-commerce. Electron. Commer. Res. Appl. 12 (5), 321–327. Peppers, D., Rogers, M., 2000. Enterprise One-to-One: Tools for Building Unbreakable Customer Relationships in the Interactive Age. Piatkus, London, UK. Perng, D.B., Chen, S.H., 2011. Directional textures auto-inspection using discrete cosine transform. Int. J. Prod. Res. 49 (23), 7171–7187. Pettersson, M., 2004. SPC with applications to churn management. Qual. Reliab. Eng. Int. 20 (5), 397–406. Rechinhheld, F., Sasser, W., 1990. Zero defections: quality comes to service. Harv. Bus. Rev. 68 (5), 105–111. Reichheld, F.F., Teal, T., 1996. The Loyalty Effect. Harvard Business School Press, Boston, MA. Rosin, P.L., 2001. Unimodal thresholding. Pattern Recogn. 34 (11), 2083–2096. Rossi, P.E., Allenby, G.M., 2003. Bayesian statistics and marketing. Mark. Sci. 22 (3), 304–328. Runge, J., Gao, P., Garcin, F., Faltings, B., 2014. Churn prediction for high-value players in casual social games. In: Proc. 2014 IEEE Conference on Computational Intelligence and Games. IEEE Comp. Soc. Press, Washington, DC. Samimi, Y, Aghaie, A., 2008. Monitoring usage behavior in subscription-based services using control charts for multivariate attribute characteristics. In: Proc. 2008 IEEE International Conference on Industrial Engineering and Engineering Management. IEEE Comp. Soc. Press, Los Alamitos, CA, pp. 1469–1474. Spiegelhalter, D., Thomas, A., Best, N., Gilks, W., 1996. BUGS 0.5: Bayesian Inference using Gibbs Sampling Manual (version ii). MRC Biostatistics Unit, Cambridge, MA.

111

Torkzadeh, G., Chang, J.C.J., Hansen, G.W., 2006. Identifying issues in customer relationship management at Merck-Medco. Decis. Support Syst. 42 (2), 1116– 1130. Tsai, C.F., Lu, Y.H., 2009. Customer churn prediction by hybrid neural networks. Expert Syst. Appl. 36 (10), 12547–12553. Van den Poel, D., Lariviere, B., 2004. Customer attrition analysis for financial services using proportional hazard models. Eur. J. Oper. Res. 157 (1), 196– 217. Venkatesan, R., Kumar, V., Bohling, T., 2007. Optimal customer relationship management using Bayesian decision theory: an application for customer selection. J. Mark. Res. 44 (4), 579–594. Verbeke, W., Dejaeger, K., Martens, D., Hur, J., Baesens, B., 2012. New insights into churn prediction in the telecommunication sector: a profit driven data mining approach. Eur. J. Oper. Res. 218 (1), 211–229. Wübben, M., Wangenheim, F.V., 2008. Instant customer base analysis: managerial heuristics often ‘‘get it right”. J. Mark. 72 (3), 82–93. Xie, M., Goh, T.N., Ranjan, P., 2002. Some effective control chart procedures for reliability monitoring. Reliab. Eng. Syst. Saf. 77 (2), 143–150. Yee, J.L., Niemeier, D., 1996. Advantages and disadvantages: longitudinal vs. repeated cross-section surveys. Proj. Battelle 94, 16. Yu, X., Guo, S., Guo, J., Huang, X., 2011. An extended support vector machine forecasting framework for customer churn in e-commerce. Expert Syst. Appl. 38 (3), 1425–1430. Zeithaml, V.A., Berry, L.L., Parasuraman, A., 1996. The behavioral consequences of service quality. J. Mark. 60 (2), 31–46. Zorn, S., Jarvis, W., Bellman, S., 2010. Attitudinal perspectives for predicting churn. J. Res. Interact. Mark. 4 (2), 157–169.