Predicting customer absence for automobile 4S shops: A lifecycle perspective

Predicting customer absence for automobile 4S shops: A lifecycle perspective

Engineering Applications of Artificial Intelligence 89 (2020) 103405 Contents lists available at ScienceDirect Engineering Applications of Artificia...

1MB Sizes 0 Downloads 35 Views

Engineering Applications of Artificial Intelligence 89 (2020) 103405

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai

Predicting customer absence for automobile 4S shops: A lifecycle perspective✩ Jiawe Wang a , Xinjun Lai a ,∗, Sheng Zhang a , W.M. Wang b , Jianghang Chen c a

School of Electro-Mechanical Engineering, Guangdong University of Technology, Guangzhou, Guangdong, China Knowledge Management and Innovation Research Center, The Hong Kong Polytechnic University, Hong Kong, China c International Business School Suzhou, Xi’an Jiaotong-Liverpool University, China b

ARTICLE

INFO

Keywords: 4S shop Repair and maintenance CRM Recurrent neural network Behavioral model

ABSTRACT Repair and maintenance services are among the most lucrative aspects of the entire automobile business chain. However, in the context of fierce competition, customer churns have led to the bankruptcy of several 4S (sales, spare parts, services, and surveys) shops. In this regard, a six-year dataset is utilized to study customer behaviors to aid managers identify and retain valuable but potential customer churn through a customized retention solution. First, we define the absence and presence behaviors of customers and thereafter generate absence data according to customer habits; this makes it possible to treat the customer absence prediction problem as a classification problem. Second, the repeated absence and presence behaviors of customers are considered as a whole from a lifecycle perspective. A modified recurrent neural network (RNN-2L) is proposed; it is more efficient and reasonable in structure compared with traditional RNN. The time-invariant customer features and the sequential lifecycle features are handled separately; this provides a more sensible specification of the RNN structure from a behavioral interpretation perspective. Third, a customized retention solution is proposed. By comparing the proposed model with those that are conventional, it is found that the former outperforms the latter in terms of area under the curve (AUC), confusion matrix, and amount of time consumed. The proposed customized retention solution can achieve significant profit increase. This paper not only elucidates the customer relationship management in the automobile aftermarket (where the absence and presence behaviors are infrequently considered), but also presents an efficient solution to increase the predictive power of conventional machine learning models. The latter is achieved by considering behavioral and business perspectives.

1. Introduction 1.1. Background and motivation China’s automobile market has continued to flourish for more than 20 years. Up until 2017, the total number of vehicles in China reached 217 million, and the country’s vehicle sales ranked number one worldwide for eight years. In 2018, the estimated sales are 25.6 million passenger cars and 2 million commercial vehicles (Wouter et al., 2018). The large demand for vehicles not only yields financial gains to manufacturing and sales, but also brings profits to the vehicle aftermarket, such as repair and maintenance. According to experiences gained in the mature markets, such as those in the European Union and the United States, the aftermarket profit margin is 76% higher than that

of vehicle sales. For 70% of automobile companies, the net profit of spare part sales can reach 25%, and some can even reach 40%; this makes the repair and maintenance services one of the most lucrative enterprises in the entire automobile business chain. In 1998, the first 4S shop (sales, spare parts, services, and surveys) was established in China by a joint venture company, GAC-Honda,1 followed by Buick and Audi. Thereafter, China’s fast-growing automobile market has attracted practically all brands in the world to establish businesses and 4S shops in the country. However, the repair and maintenance business has increasingly become extremely competitive. The local repair shops have not only improved services and provided lower prices to attract customers, but also stimulated the internet and online-to-offline business in China (Wang et al., 2016); it has vastly facilitated the comparison of service quality

✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.engappai.2019.103405. ∗ Corresponding author. E-mail address: [email protected] (X. Lai). 1 GAC: Guangzhou Automobile Corporation, China.

https://doi.org/10.1016/j.engappai.2019.103405 Received 3 June 2019; Received in revised form 15 November 2019; Accepted 21 November 2019 Available online xxxx 0952-1976/© 2019 Elsevier Ltd. All rights reserved.

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405

and prices of several shops for customers. In the past five years, customer churn has become the main reason for the decline or even bankruptcy of several 4S shops in China. Consider the 4S shop we surveyed as an example. The shop, established by a large American brand, sells passenger cars worth between 80,000 and 250,000 RMB (approximately between US$11,500 and US$35,700, respectively). It is located in a city that belongs to the rich Yangtze River Delta in East China. According to our investigation, there are significant drawbacks in the customer relationship management (CRM) of this 4S shop that are also common problems in practically all 4S shops in China. First, 4S shops are mainly established by overseas automobile companies; because of distance and cultural differences, communication gaps exist between the shops and the headquarters. These gaps delay and make it considerably difficult for 4S shops to alter their business activities to cope with the fast-changing market. Second, the management style is qualitative rather than quantitative. Although most 4S shop managers use conventional and lucrative 4S business models, their attention to data mining is extremely limited; customer records are simply stored in the IT system without any thorough investigation of possible opportunities for profit growth. Third, and the most interesting, customers generally have only three years of loyalty. This is because customers have to return to their 4S shops for maintenance during the warranty period, which is usually three years according to the warranty policy of most brands; otherwise, the warranty will become void. Consequently, more than 90% of customers leave 4S shops after the first three years of purchase. Accordingly, the aforementioned has motivated us to investigate the factors that affect customer churn in the automobile aftermarket. In particular, the objective is to fully exploit customer datasets and design personalized solutions to retain valuable clients who are potential churn customers.

further exploration in model structure development is conducted. Moreover, the behavioral variables are only treated once (e.g., whether something has been done) or from an aggregative style (e.g., more than twice in the past six months). Limited studies have been expended to consider the totality of lifecycle behaviors in model development. In our case, each of the presence and absence transactions cannot be simply treated as a single observation. The behavioral pattern of each customer is unique and continuous; thus, a customer’s entire lifecycle (from the first to the last visit) should be considered as a whole and incorporated into the modeling. The full consideration of the behavior and business logic into the specification of conventional machine learning approaches is novel and crucial; it increases the predictive power and interpretation ability. The foregoing are the three problems presented in this paper, and the resolution of these problems is our contribution to the existing literature in terms of methodology.

1.2. Problems

2.1. Customer relationship management

Three problems are identified and listed as the following. First, the dataset contains records of customer visits to a shop; it is not known whether a particular customer has visited other shops that belong to other companies. Therefore, if we observe that a client does not return for car maintenance for a certain period of time, then the following are possible. (1) The customer has churned, will never come back, and therefore has been permanently lost. (2) The customer visited other shops for the recent maintenance, and may come back for the next maintenance; thus, only the profit from one maintenance is lost. (3) The customer did not go to any shop for maintenance, may come back soon; hence, potential profit from maintenance remains. In other words, we merely have the presence-only data; we do not know what transpired during the absence periods. In order to estimate the potential customer churn, a practical solution for data preprocessing using the presence-only data is required to facilitate its mathematical modeling. Second, in numerous CRM research works and models, customer churn is easy to detect because it usually occurs at the end of a contract, e.g., telephone contract (Huang et al., 2012) or banking (Mosavi and Afsar, 2018). The presence or end of a contract is critical for an analyst to determine whether or not a customer will churn. On the other hand, certain businesses do not have any contract, but the duration between each purchase is brief, e.g., supermarket transactions; customer churn can be identified easily based on purchase data. However, in 4S shops, there is no contract, and the duration between each maintenance is relatively long (usually more than six months). Consequently, it is overly late to implement any retention solution when the managers realize a customer has definitely churned; adjusting this problem to a CRM research framework is critical. Third, although numerous econometric and machine learning models are applied to the CRM, behavioral and business-related variables are only treated as input to the conventional model framework; no

Bose (2002) defines CRM as an integration of technologies and business processes that is used to satisfy the requisites of a customer during any given interaction. More specifically, the CRM includes the acquisition of customer information and evaluation of customer knowledge in order to achieve a steady and long-term enterprise management. That is, it is endeavored to define the real requisites of the customer by having the enterprise integrate various processes and technologies; accordingly, improvements in internal products and services can be achieved in order to simulate efforts for enhancing customer satisfaction and loyalty (Pai and Tu, 2011). For most companies, the CRM has become an essential strategy to maximize customer value and company profit. The CRM system is composed of four dimensions: customer identification, customer attraction, customer retention, and customer development (Ngai et al., 2009). Among these dimensions, customer retention has an essential function in the whole system; this is because in most domains, retaining an old customer has a higher profitability than acquiring a new one (Larivière and Van den Poel, 2005). The accurate prediction of customer churn behavior and development of targeted market strategies are crucial to customer retention. Krishna and Ravi (2016) surveyed 78 papers where the application of evolutionary computing (EC) techniques to analytical CRM tasks to optimize the profitability. In these mathematical models, customer churn prediction is usually a classification problem that uses existing customer data; thereafter, it divides the customer’s next behavior into churn and non-churn based on a priori knowledge. Table 1 summarizes several typical and related papers regarding the CRM in various domains. In particular, it has been observed that machine learning approaches were widely used in the CRM where the IT system and data warehouse were applied for big data collection.

1.3. Paper structure The rest of the paper is organized as follows. Section 2 presents a related literature review on the automobile industry, CRM, and possible solutions for mathematical modeling. Moreover, the motivations for proposing a new recurrent neural network (RNN) are discussed. Section 3 presents the cleaning of customer presence and absence data, the specification for a conventional recurrent neural network (RNN), and the proposed model. Section 4 presents the experiment to validate the new model; comparisons are also presented. Based on the developed model, the customized customer retention solutions for management in the surveyed 4S shop are discussed in Section 5. The last section summarizes the conclusions. 2. Literature review

2

J. Wang, X. Lai, S. Zhang et al.

Table 1 Summary of related research works. Related researches

Domain

Dataset

Model

Prediction task (nature of dependent variable/task)

Findings

Lariviere and Van den Poel, 2005 Coussement and Van den Poel, 2008 Chan, 2008

Financial

100,000 records of customer data 45,000 subscriptions from Belgium 4659 Nissan customers

RF

827,124 customers from Ireland 1,857 questionnaires

Multiple models

Whether or not to buy another product (discrete and non-repeated); Retention and profitability (continuous) Whether or not the subscription will be renewed (discrete and non-repeated) Effectively target valuable customers by maximizing the lifetime value (continuous) Whether or not a customer ends the current contract (discrete) Establish customer markets and CRM rules (association rules) Whether or not to switch to another company (discrete) Churn prediction (discrete) and churn warning expert system (association rules)

The most loyal customers are not the most profitable clients. Client–company interaction variables performs an important function. Identifying customer behaviors is crucial, and a customized campaign should be considered. The LG, MLP, and NB are outperformed by SVM and C4.5, but the computational cost of SVM is expensive. Recent, frequent, monetary, and social demographics are applied as independent variables. Clustering before building prediction models is proposed.

Newspaper subscription

SVM GA

Chiang, 2012

Online retail

Lu et al. 2014

Mobile telecommunication Bank credit card

7190 records of customer data 14,814 customers from Latin American Bank

LG

Social media

921 Facebook users

Multiple models

Automobile service satisfaction Airlines

Unknown dataset size

GA

Prediction of increase in usage frequency (discrete and binary) Customer satisfaction/dissatisfaction

413 questionnaires

APA

Find valuable travelers for airlines (association rules)

Bank

285 records of customer data in Iran 145,823 customer records of more than six years

Multiple models

Customer value mining and market segmentation (clustering) Presence and absence status of customers (discrete and repeated)

Farquad et al. 2014 3 Ballings and Van den Poel, 2015 Bandaru et al. 2015 Chiang, 2017 Mosavi and Afsar, 2018 This paper, 2019

Automobile repair and maintenance

DT, APA

SVM, NBT

new RNN

Not only a high-accuracy prediction model should be developed; business suggestions are the most crucial in practice. AdaBoost is the best model; behavioral predictors are important. #Visits, #days of repair, costs, #days between visits, miles between visits, problem severity are considered. Social demographics, habits, and consumption behaviors are important. Customized marketing strategies are acquired. The method is practical for missing data. Repeated behaviors are considered from a lifecycle

perspective. A sophisticated neural network is proposed. Business insights are incorporated in the model structure. SVM: support-vector machine; RF: random forest, DT: decision tree; LG: logistic; GA: genetic algorithm; APA: apriori algorithm; RNN: recurrent neural network; ADAB: AdaBoost; NBT: naïve Bayes tree.

Engineering Applications of Artificial Intelligence 89 (2020) 103405

Huang et al. 2012

Automobile retail campaign Landline telephone

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405

In finance and banking, Larivière and Van den Poel (2005) and Farquad et al. (2014) applied the random forest (former), SVM, and naïve Bayes tree (latter) to predict whether or not a customer will switch to another bank. In the subscription services of conventional business, Coussement and Van den Poel (2008) employed 45,000 records of customer data from Belgium and the SVM to predict whether or not a contract will be renewed. Similarly, Huang et al. (2012) and Lu et al. (2014) studied the landline and mobile telecommunications; the former focused more on multiple machine learning comparisons, whereas the latter predicted the churn behavior and developed warning rules in advance to retain customers. Related studies are conducted worldwide as internet-based services flourish. Chiang (2012) analyzed the customer behaviors of three online shopping platforms and mined association rules for the CRM; their data were acquired through questionnaires. Social media, where usage frequency is closely associated with advertisement revenue, also value user loyalty. Ballings and Van Den Poel (2015) compared multiple machine learning models and discovered that AdaBoost is the best among them. Moreover, behavioral predictors are found to be crucial in CRM models. Bandaru et al. (2015) seems to be one of the very few recent papers that analyzed the CRM in the automobile company with warranty data. They developed a customer satisfaction index (CSI), and with which, they derived rules to classify customers of different satisfaction; besides, they also discussed how to identify critical field failures with the proposed index. In their paper, six variables including number of visits, days of repair, costs, days between visits, miles between visits, and problem severity are considered. The CSI is built based on an assumption that the distribution of CSI of all customers should be as narrow as possible, so that an optimization problem is solved to obtain the coefficients of the six variables in the CSI. Although the number of articles published in literature regarding the automobile aftermarket is limited, two papers, where the CRM in the transportation domain was studied, were found to be related. Chiang (2016) analyzed the CRM in airlines, where valuable travelers were selected by the airline company through the apriori algorithm. In the automobile business, Chan (2008) quantified the lifetime value of each customer in the retail campaign, and a genetic algorithm was employed for the selection of high-value customers. However, the features of customers largely differ between the automobile retail market and aftermarket; a purposive consideration of data analysis and mathematical modeling is required. Summary: (1) Customer churn modeling is usually regarded as a classification problem, where the binary status is to be predicted; for instance, whether a customer will churn, switch to another company, or usage will increase. Note that they all have a clear indicator (e.g., a contract) to identify the binary status; this differs from the case in our study. (2) For independent variables, socio-demographics, customer habits, and customer behaviors are the most employed variables. However, they are merely regarded as simple inputs for conventional model frameworks. Few attempts have been made to further exploit the usage of these variables; such use may be the development of a more behaviorally interpretable machine learning model. Moreover, the behaviors were used only once (e.g., whether something was done) or used in an aggregative way (e.g., something was done more than twice in the past three weeks). It seems to be a waste of effort because the lifecycles of customer behaviors are available. (3) Regarding the modeling approaches, the most employed and compared models include logistic regression, support vector machine, naïve Bayes, decision tree, random forest, apriori, multilayer perception neural network; all of these are merely conventional machine learning approaches. Nevertheless, first, note that the development of a customized model is scant; apparently, these frameworks cannot satisfy the necessity of fully exploiting the lifecycle of customer behaviors. In

other words, all these models were somehow treated as black boxes, and few attempts were made to improve their interpretation ability. Second, with the development of deep learning, several model structures, such as the recurrent neural network and the long-short term memory model, were found to outperform these models (Choi et al., 2016; Fang et al., 2018). This aspect is further discussed in the next section. (4) Business insights and applications are intended for the CRM model development. However, existing literature either focuses on model comparison and feature selection (Huang et al., 2012; Ballings and Van Den Poel, 2015; Mosavi and Afsar, 2018) or the independent generation of warning rules from the prediction model (Farquad et al., 2014). Further investigations are necessary to fully use the developed prediction model, not only for customer retention, but also for profitability. 2.2. Recurrent neural network As discussed above, the RNN has been found to outperform several conventional machine learning models, such as natural language process (Morchid, 2018) and computer vision (Pavel et al., 2017) in the prediction task (Morton et al., 2017). There have been attempts to employ the RNN to provide more reliable and practical customer management strategies for managers. Salehinejad and Rahnamayan (2016) proposed a customer behavior prediction model using recurrent neural networks (RNNs) based on client loyalty number, and recency, frequency, and monetary (RFM) variables. Results show that RNNs have competitive performance for the RFM recommender system. The long short-term memory (LSTM) (Hochreiter and Schmidhuber, 1997; Gers, 2001) is among the several variants of RNN. The basic architecture of LSTM is similar to the RNN; however, each unit in the hidden layer is associated with a cell, an input gate, a forget gate, and an output gate. This advance architecture accelerates the learning model to select few old information to forget while renew information in handle the data of long time, consequently, gradient vanishing and explosion problems are overcome. Moreover, the LSTM has the high universality of evolving into more suitable patterns under various practical situations; accordingly, variants of the LSTM were developed to be utilized in different studies. In the recent three years, increasing interest has been focused on the utilization of the LSTM not only for conventional pattern recognition but also for the further investigation of human behaviors, such as in passenger demand (Ke et al., 2017), education (Zhou et al., 2018), and consumer behavior (Gloor et al., 2019) analyses. In particular, these applications all have the temporal dependency feature, which significantly affects the endogenous variable; moreover, sequential relationships in the data cannot be ignored. However, in most of these studies, the RNN and LSTM specifications are mostly considered as ‘‘plug-in’’ tools, where the ability for more interpretation is rarely investigated. Although the applications of LSTM in marketing and CRM are scant, a recent practice provides interesting and insightful experiences. To our knowledge, the work of Yang et al. (2018) seems to be the first and only study to apply the LSTM structure in the CRM for Snapchat with the intent of developing a joint model that can simultaneously capture user types and churn. A parallel LSTM structure was applied; accordingly, an ensemble model style and an attention mechanism were employed. As they reported, they increased the interpretation ability of the LSTM model in behavior modeling; in particular, the increase in interpretation ability was mostly provided by the user type clustering. The paper of Yang et al. (2018) presents a possible solution to endow more behavioral meaning to black-box-like machine learning models; accordingly, this has motivated us to explore this field and direction further. In the current study, the automobile repair and maintenance behaviors have unique features, such as missing absence data, full lifecycle data, and three-year loyalty, as discussed above. A well-designed datacleaning procedure, model selection, and specifications are required. 4

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405

Based on the multi-dimensional time series feature of customer lifecycle absence and presence behaviors, an RNN structure is applied. The customized specification of a new LSTM model is proposed to fully exploit the longitudinal six-year dataset of customer behaviors not only to increase the predictive power of the model, but also to improve the interpretation ability of black-box-like machine learning methods.

3.2. Customer absence and presence In automobile maintenance, when a vehicle reaches the maintenance period (this term usually refers to either the time interval or mileage interval between two consecutive maintenance), the 4S shop will advise the owner to have the car undergo a preventive maintenance or check-up. However, for certain reasons (e.g., high cost and expired warranty period), some customers will not be present in the auto 4S shop during the aforementioned maintenance period; that is, they will be absent from the 4S shop in the next maintenance. According to the above description, customer absence and customer presence in the automobile maintenance industry are defined as follows.

3. Methodology The overall framework of the proposed methodology shown in Fig. 1 has three phases: data cleaning, modeling, and targeted CRM. (1) In the data cleaning phases, the raw data are firstly preprocessed into an available format suitable for further analysis; ‘‘customer absence’’ and ‘‘customer presence’’ are defined to mine absence records based on customer behaviors. Thereafter, the warranty and customer datasets are combined into a dataset for model training and testing. (2) In the modeling phase, a novel structure of RNN, namely RNN2L, is proposed for customer behaviors modeling, where a substantial of computing time can be saved but the model performance is not sacrificed. 10-fold cross-validation experiment is conducted, and the confusion matrix, receiver operating characteristic (ROC) curve, area under the curve (AUC), F1-score, and accuracy are presented. (3) In the last phase, a targeted CRM strategy is proposed, based on the novel RNN-2L, to identify potential churn but discount-sensitive customers from all customers, so that the profitability is optimized. The details of the framework are described in the following sections.

Definition 1. Customer absence means that a customer is absent from the 4S shop when his/her vehicle reaches maintenance period. Definition 2. Customer presence means that a customer is present in the 4S shop for maintenance when his/her vehicle reaches maintenance period. Having defined customer absence and presence, the following sections present the techniques for detecting these two customer behaviors. 3.3. Absence records In Section 3.1, it is indicated that every warranty record in the original warranty dataset is used to record the detailed information of a particular maintenance. In other words, only when a customer is present in the 4S shop can it have a corresponding warranty record, i.e., there is no warranty record that corresponds to customer absence (named absence record) in the original warranty dataset. However, the fact is that customer absences are hidden in the dataset, e.g., if the time interval of two consecutive presences is greater than the two maintenance periods, then it can be concluded that a customer absence exists between the two presences. After locating the customer absence in the original warranty dataset, the absence records can be generated by some particular rules; the rules used for generating absence records are listed in the fifth column of Table 5, where the following can be noted.

3.1. Raw dataset and data cleansing The raw dataset consists of two parts: customer dataset and original warranty dataset stored in the database of 4S shops. The customer dataset contains customer socio-demographic features (e.g., age, sex, income, and educational background) and vehicle features (e.g., model type, model year, color, and selling price). The original warranty dataset, which corresponds to the customer dataset, is used to record each customer’s warranty or maintenance history, in which maintenance information details, such as total cost, hourly rate, warranty period status, maintenance time, and maintenance date, are recorded. Once a customer visits a 4S shop for vehicle maintenance, a customer vehicle maintenance record is added to the client’s vehicle maintenance history in the original warranty dataset. In other words, if a customer has visited the 4S shop 10 times for vehicle maintenance, then the customer will have 10 warranty records in the original warranty dataset. Data cleansing is conducted as follows. First, a simple feature selection method is implemented; a feature having more than 20% missing values and outliers is abandoned. Second, the missing values and outliers of a continuous feature are replaced by its mean, (e.g., a negative value in the total cost is considered as an outlier and is replaced by the mean of the total cost), and the missing values in the categorical feature are treated as a separate category. Third, in preprocessing the categorical and continuous features, it is evident that the direct use of the original features for model training may result in a biased model (Bandaru et al., 2015). Therefore, a normalization method that maps the continuous features to the range of [0, 1] using Eq. (1) and one-hot representation method that encodes the categorical features by dummy variables are used in this study. This technique creates k dummy variables, where k is the number of distinct values of the categorical feature (Coussement et al., 2017). A dummy is a binary variable that takes 1 or 0; this indicates the presence or absence of a particular category, respectively, as shown in Eq. (1). (1)

(1) It is apparent that when a customer is absent, maintenance cost and time consumption do not exist; thus, these are set to 0, i.e., set ‘‘Total cost’’, ‘‘Actual repair time’’, ‘‘Number of orders’’, and so on to 0. (2) The number of absence records does not affect ‘‘Cumulative presence’’; thus, the value of ‘‘Cumulative presence’’ in an absence record is equal to its value in the latest presence record. (3) The values of ‘‘mileage’’ and ‘‘date’’ in an absence record can be calculated by either adding a maintenance period to the value in the last record or using linear interpolation. (4) The warranty period provided by the surveyed 4S shops is three years per 100 000 km. Moreover, the date of each vehicle invoice is recorded in the customer dataset. Thus, once the ‘‘date’’ and ‘‘mileage’’ are calculated, the value of ‘‘warranty period status’’ can be inferred from an absence record. (5) ‘‘Repair type’’ is a categorical feature. When a customer is absent, maintenance does not exist; thus, the value of ‘‘Repair type’’ in the absence record is treated either as a missing value or named as ‘‘unknown’’. Moreover, as mentioned in Section 3.1, the missing value in the categorical feature is treated as a new category.

where 𝑥𝑗 is the 𝑗th feature, and 𝑧𝑗 is the normalization feature. It is necessary to emphasize that such a linear transformation does not affect the frequency distribution.

Although rules (1)–(5) are particularly defined for the surveyed 4S shops, it is easy to apply their concepts for defining the specific rules for other 4S shops.

𝑧𝑗 =

𝑥𝑗 − 𝑚𝑖𝑛(𝑥𝑗 ) 𝑚𝑎𝑥(𝑥𝑗 ) − 𝑚𝑖𝑛(𝑥𝑗 )

,

5

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405

Fig. 1. Framework.

Table 2 The notations used in this paper. Notation

Description

𝑛𝑐 𝑡𝑚 𝑚𝑛 𝑡𝐿 𝑛𝑝𝑖 𝑝𝑟𝑒𝑠𝑒𝑛𝑐𝑒(𝑖) 𝑡 𝑑𝑎𝑡𝑒𝑃𝑡(𝑖) 𝑚𝑖𝑙𝑒𝑎𝑔𝑒𝑃𝑡(𝑖) warranty record 𝑤𝑎𝑟𝑟(𝑖) 𝑡 𝑐 (𝑖) (𝑖) 𝑠𝑒𝑞𝑡 𝑠𝑒𝑞𝑡(𝑖) 1 ,𝑡2 𝑥(𝑖) 𝑡 (𝑖) 𝑥𝑡1 ,𝑡2 𝑦(𝑖) 𝑡

total number of customers time interval in maintenance period mileage interval in maintenance period latest date in warranty dataset total number of 𝑖th customer’s presence records 𝑡th presence record of 𝑖th customer date of 𝑝𝑟𝑒𝑠𝑒𝑛𝑐𝑒(𝑖) 𝑡 mileage of 𝑝𝑟𝑒𝑠𝑒𝑛𝑐𝑒(𝑖) 𝑡 general name of presence record and absence record 𝑡th warranty record of 𝑖th customer 𝑖th customer’s customer features a 𝑡-length sequence composed of 𝑖th customer’s first 𝑡 warranty records, i.e., 𝑠𝑒𝑞𝑡(𝑖) = [𝑤𝑎𝑟𝑟1(𝑖) , 𝑤𝑎𝑟𝑟(𝑖) , … , 𝑤𝑎𝑟𝑟𝑡(𝑖) ] 2 a truncation sequence; combination of 𝑠𝑒𝑞𝑡(𝑖) and 𝑐 (𝑖) , i.e. 𝑥𝑡(𝑖) = [𝑠𝑒𝑞𝑡(𝑖) , 𝑐 (𝑖) ] combination of 𝑠𝑒𝑞𝑡(𝑖) and 𝑐 (𝑖) , i.e. 𝑥𝑡(𝑖)1 ,𝑡2 = [𝑠𝑒𝑞𝑡(𝑖) , 𝑐 (𝑖) ] 1 ,𝑡2 1 ,𝑡2 the label of 𝑥(𝑖) , a binary variable, takes 0 or 1, 0 for ‘‘customer absence’’, and 1 for ‘‘customer presence’’ 𝑡

An example of generating absence records is shown in Table 3; for simplicity, maintenance period only refers to the time interval. In Panel a, customer i is present at times 𝑡1 and 𝑡2 ; thus, there are two presence records. Suppose that the latest date of the original warranty dataset is 𝑡𝐿 , and the customer’s habit for the maintenance period is 𝑡𝑚 . (1) If (2 × 𝑡𝑚 ≤ 𝑡2 − 𝑡1 < 3 × 𝑡𝑚 ), then it can be concluded that a customer absence exists at (𝑡1 + 𝑡𝑚 ); if (3 × 𝑡𝑚 ≤ 𝑡2 − 𝑡1 < 4 × 𝑡𝑚 ), then two customer absences exist at (𝑡1 + 𝑡𝑚 ) and (𝑡1 + 2 × 𝑡𝑚 ), and so forth. (2) On the

other hand, if the latest time, 𝑡2 , satisfies (𝑡2 + 𝑡𝑚 < 𝑡𝐿 ), then it can be concluded that there exists a customer absence at (𝑡2 + 𝑡𝑚 ). 3.4. Dataset for model training To obtain the dataset for training, first, add absence records to the original warranty dataset to extend it. The detailed process of 6

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405 Table 3 Example of generation of absence records. Panel a: original warranty date of customer 𝑖 Warranty records Customer ID

Date

Total cost

Mileage

Warranty period status

Repair type

Cumulative presence

Presence/ absence

𝑖

𝑡1

𝑐1

𝑚1

minor

1

presence

𝑖 ......

𝑡2

𝑐2

𝑚2

Under warranty Expired

major

2

presence

Panel b: 2 × 𝑡𝑚 ≤ 𝑡2 − 𝑡1 < 3 × 𝑡𝑚 &𝑡𝑠 + 𝑡𝑚 ≤ 𝑡𝐿 ; 𝑣 = (𝑚2 − 𝑚1 )∕(𝑡2 − 𝑡1 ) Warranty records Customer ID

Date

Total cost

Mileage

Warranty period status

Repair type

Cumulative presence

Presence/ absence

𝑖

𝑡1

𝑐1

𝑚1

minor

1

presence

𝑖

𝑡1 + 𝑡𝑚

0

𝑚1 + 𝑣 × 𝑡𝑚

unknown

1

absence

𝑖 ...... 𝑖

𝑡2

𝑐2

𝑚2

Under warranty Under warranty Expired

major

2

presence

𝑡2 + 𝑡𝑚

0

𝑚2 + 𝑣 × 𝑡𝑚

Expired

unknown

2

absence

Panel c: 3 × 𝑡𝑚 ≤ 𝑡2 − 𝑡2 < 4 × 𝑡𝑚 &𝑡2 + 𝑡𝑚 > 𝑡𝐿 Warranty records Customer ID

Date

Total cost

Mileage

Warranty period status

Repair type

Cumulative presence

Presence/ absence

𝑖

𝑡1

𝑐1

𝑚1

minor

1

presence

𝑖

𝑡1 + 𝑡𝑚

0

𝑚1 + 𝑣 × 𝑡𝑚

unknown

1

absence

𝑖 𝑖 ......

𝑡1 + 2 × 𝑡𝑚 𝑡2

0 𝑐2

𝑚1 + 𝑣 × 2𝑡𝑚 𝑚2

Under warranty Under warranty Expired Expired

unknown major

1 2

absence presence

generating absence records is shown by the pseudocode in Algorithm 1; the notations used in this paper are summarized in Table 2.

In machine learning, several techniques have been proposed for classification problem. Deep learning is a type of machine learning method, which is developed to learn data representation with multiple levels of abstraction (Lecun et al., 2015). Among the various architectures of deep learning models, the RNN is designed for modeling sequential data, such as handwriting text (Graves et al., 2009), speech, or video (Graves, 2012). Among many variations of the RNN model, The LSTM is one of the most utilized one. The illustrations of the simple RNN (SRNN) is presented in the left panel of Fig. 2, and a LSTM unit is shown in the left-bottom of Fig. 3, respectively. As discussed above, after processing and constructing the absence and presence behavior data, the lifecycle behaviors of all customers are revealed. Naturally, these data are all sequence data, where the deep learning models such as RNNs are suitable in our case.

Algorithm 1: Generating absence records Input: 𝑡𝑚 , 𝑚𝑚 , 𝑡𝐿 , raw warranty data for i = 1 to 𝑛𝑐 do for t = 1⌊to 𝑛𝑝𝑖 −1 do

⌋ (𝑖) 𝑘1 = (𝑑𝑎𝑡𝑒𝑃𝑡(𝑖) − 𝑑𝑎𝑡𝑒𝑃𝑡−1 )∕𝑡𝑚 − 1 ⌋ ⌊ (𝑖) )∕𝑚𝑚 − 1 𝑘2 = (𝑚𝑖𝑙𝑒𝑎𝑔𝑒𝑃𝑡(𝑖) − 𝑚𝑖𝑙𝑒𝑎𝑔𝑒𝑃𝑡−1

if k1 > 0 then Generate 𝑘1 absence records using Table 3’s rules between 𝑝𝑟𝑒𝑠𝑒𝑛𝑐𝑒(𝑖) and 𝑝𝑟𝑒𝑠𝑒𝑛𝑐𝑒(𝑖) 𝑡 𝑡−1 else if k2 > 0 then Generate 𝑘2 absence records using Table 3’s rules between 𝑝𝑟𝑒𝑠𝑒𝑛𝑐𝑒(𝑖) and 𝑝𝑟𝑒𝑠𝑒𝑛𝑐𝑒(𝑖) 𝑡 𝑡−1 end end

3.5.1. Modeling lifecycle sequential behaviors The final churn behavior, which is also an absence behavior, is explained by the habits or behaviors of the past; it is regarded as an intuition. If a customer comes or goes rather freely and thus has several absence behaviors within the warranty period, then this suggests that this customer does not have a strong preference for availing vehicle maintenance in 4S shops; thus, it should be least expected that the client will be back for the next maintenance job. On the one hand, if a customer strictly follows the guidebook for vehicle maintenance and infrequently has absences, then it is considerably possible that he will be back for the next maintenance schedule. Although certain userclustering methods can also provide similar outcomes, these somehow have less predictive power and hard cut-offs. In our approach, such customer habits are implicitly modeled in the RNN. Moreover, not only the final churn can be predicted, but also the next behaviors; these will be practical for monthly and quarterly profit estimations for managers. Mathematically, the sequential records of customers from date = 1 to date = 𝑡-1 are utilized to predict customer behavior at date = 𝑡. More

(𝑖) if 𝑑𝑎𝑡𝑒𝑃𝑛𝑝 + 𝑡𝑚 < 𝑡𝐿 then 𝑖 Generate one absence record using Table 3’s rules behind 𝑝𝑟𝑒𝑠𝑒𝑛𝑐𝑒(𝑖) 𝑛𝑝𝑖 end end Output: warranty data with absence records

3.5. Prediction/classification of customer lifecycle behaviors After data cleaning, the warranty dataset (consisting of presence records and absence records) and customer dataset are combined into a dataset for model training and testing. Our goal is to predict the future behaviors (absence or presence) of customers based on their historical warranty and customer features; one-step prediction is the most frequently used setting (Gers, 2001). 7

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405

Fig. 2. Illustration of the two input layers for the SRNN-2L, and the time saving advantage of the novel model.

Fig. 3. Illustration of LSTM-2L and LSTM unit.

specifically, we use 𝑥(𝑖) to predict 𝑦(𝑖) 𝑡 , where (𝑡 ≥ 2) (it is unnecessary 𝑡−1 and not possible to predict 𝑦(𝑖) because it denotes the first time that the 1 customer visits the 4S shop, i.e., 𝑦(𝑖) identically equals to 1, and 𝑥0(𝑖) does 1 not exist). Therefore, in our model setting, a sample is denoted as (𝑥(𝑖) , 𝑡−1 (𝑖) (𝑖) (𝑖) 𝑦(𝑖) 𝑡 ), where 𝑥𝑡−1 are features and 𝑦𝑡 is a label. The features, 𝑥𝑡−1 , should be reorganized into a suitable format so that they can be inputted to (𝑖) the RNN; specifically, 𝑥𝑡−1 should be translated into a sequence format. (𝑖) (𝑖) (𝑖) Because 𝑥𝑡−1 = [𝑠𝑒𝑞𝑡−1 , 𝑐 (𝑖) ], where 𝑠𝑒𝑞𝑡−1 is a sequence but 𝑐 (𝑖) is not, it is necessary to remedy this situation by adding 𝑐 (𝑖) into each element (𝑖) of 𝑠𝑒𝑞𝑡−1 ; consequently, 𝑥(𝑖) is transformed into Eq. (2). 𝑡−1 (𝑖) 𝑥(𝑖) = [𝑠1(𝑖) , 𝑠2(𝑖) , … , 𝑠𝑡−1 ], 𝑡−1

dependent, as shown in the left panel of Fig. 2. In this regard, unnecessary consuming time is wasted, because computing these time-invariant features in the model for only once is sufficient. If the original RNN model is applied and long computational time is inevitable, it would considerably decrease the attractiveness of applying deep learning frameworks in industry with large dataset. To address this issue, we propose a novel approach for the heterogeneous input. 3.6. Two layers for heterogeneous input In the customer dataset, each customer feature is recorded after registration; from then on, it is rarely changed. Hence, customer feature 𝑐 (𝑖) is considered as time-invariant; in other words, it is a time-independent feature. However, the traditional SRNN and LSTM introduced in the above subsection only have one input layer that is used to handle sequential input/time-variant input, as shown in the left panel of Fig. 2. To deal with 𝑐 (𝑖) , the traditional RNN treats it as a part of the sequential input, as shown in Eq. (2). The disadvantage of this remedy is that it causes input redundancy because the time-invariant 𝑐 (𝑖) will be inputted to the network many times, which makes the training process more time consuming.

(2)

(𝑖) (𝑖) where 𝑠𝑗(𝑖) = [𝑤𝑎𝑟𝑟(𝑖) 𝑗 , 𝑐 ] is the 𝑗th element in the sequence 𝑥𝑡−1 . Thus,

at time step 𝑗, the input of RNN is 𝑠(𝑖) 𝑗 . Note that in our dataset, there are not only sequence data; moreover, there are time-invariant data, such as customer sociodemographics. Therefore, the features 𝑥(𝑖) in each sample consist of 𝑡−1 (𝑖) a (𝑡-1)-length sequence, 𝑠𝑒𝑞𝑡−1 , and customer features, 𝑐 (𝑖) . If we apply the conventional RNN framework, usually all features are repeatedly computed in the circles of the network and all considered as time 8

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405 Table 4 Descriptive statistics of each feature in customer data.

To remove input redundancy in the traditional RNN, another input layer, which is solely used for time-invariant features 𝑐 (𝑖) , is introduced into the RNN. Hence, there will be two input layers in the new RNN: (𝑖) one for sequence 𝑠𝑒𝑞𝑡−1 and the other for 𝑐 (𝑖) , as shown in the right panel of Fig. 2. To differentiate the RNN with two input layers from the traditional, we named it RNN-2L. In RNN-2L, ℎ𝑇 is calculated given the sequence (𝑠1 , 𝑠2 , … , 𝑠𝑇 ), where 𝑐 is taken away from all 𝑠𝑖 . Once ℎ𝑇 is computed, 𝑐 is inputted to calculate 𝑦, ̂ as given by Eq. (7). ℎ0 = 𝜎(𝑜),

(3)

ℎ1 = 𝜎(𝑠1 𝑈 + ℎ0 𝑊 + 𝑏ℎ ),

(4)

...

(5)

ℎ𝑇 = 𝜎(𝑠𝑇 𝑈 + ℎ𝑇 −1 𝑊 + 𝑏ℎ ),

(6)

𝑦̂ = softmax(ℎ𝑇 𝑉1 + 𝑐𝑉2 + 𝑏𝑦 ),

(7)

(8)

𝑖𝑡 = 𝜎(𝑠̄𝑡 𝑊𝑖 + 𝑏𝑖 ),

(9)

𝑎𝑡 = 2 tanh(𝑠̄𝑡 𝑊𝑎 + 𝑏𝑎 ),

(10)

𝑔 𝑡 = 𝑓 𝑡 ⊙ 𝑔 𝑡 + 𝑖𝑡 ⊙ 𝑎 𝑡 ,

(11)

𝑜𝑡 = 𝜎(𝑠̄𝑡 𝑊𝑜 + 𝑏𝑜 ),

(12)

ℎ𝑡 = 𝑜𝑡 ⊙ tanh(𝑔𝑡 ),

(13)

11 12 5 14 2 6 6 6 2 2

4. Experiment and results 4.1. Experimental setup 4.1.1. Dataset The raw dataset provided by our surveyed 4S shops is composed of the customer dataset and original warranty dataset. In the customer dataset, there are 156,363 records of customer socio-demographic information. Corresponding to the customer dataset, the original warranty dataset has 119 183 presence records of 15,363 customers that were acquired between March 2, 2010 and March 9, 2016: a six-year dataset. Thereafter, data were cleansed; the descriptive statistics of each feature in the customer and warranty datasets are summarized in Tables 4 and 5, respectively. According to our surveyed 4S shop, it is suggested that the vehicle maintenance be every 3–6 months or 7500-km increase in mileage. Accordingly, the time and mileage intervals in the maintenance period are set as 6 months and 7500 km (i.e., 𝑡𝑚 = 180 days and 𝑚𝑚 = 7500 km), respectively. The latest date above in the warranty dataset is March 9, 2016; thus, we set 𝑡𝐿 = March 9, 2016. Thereafter, we run the procedure shown in Fig. 3 to generate the absence records. As a result, 26,640 customer absences are found in the original warranty dataset; hence, the number of absence and presence records in the resulting warranty dataset are 26,640 and 119,183, respectively. In other words, the dataset for model training and testing is composed of 26,640 negative (absence) samples and 119,183 positive (presence) samples; accordingly, the total number of records is 145,823.

...... 𝑦̂ = softmax(ℎ𝑇 𝑉1 + 𝑐𝑉2 + 𝑏𝑦 ),

Number of categories

Model year Vehicle color Vehicle selling price rank Career Gender Educational background Income rank Age rank Marital status Owner identity

be saved. With similar logic, a RNN model can be ‘‘divided’’ into several parts based on managerial experiences and insights. Alternatively, as a straightforward approach, different models can be built according to different phases of a customer. Readers may wonder whether such simplification of the model would substantially save computational time, or whether there is significant decrease in model performance. To answer these questions, in the next section, we provide more exercises, tests and discussions with our large customer dataset.2

where 𝜎(.) is a sigmoid function/logistic function, and softmax(.) is a softmax function; 𝑈 , 𝑊 , 𝑉1 and 𝑉2 are weightings; 𝑏ℎ and 𝑏𝑦 are bias terms. By this modification, we can expect a considerable time can be saved, since the time-invariant feature 𝑐 is only computed once, instead of repeatedly. This is in fact a simplification of the model, and also a customization of the structure for the customer behaviors in our case. Similar, the new variant of LSTM, named as LSTM-2L, is developed with the same logic. As shown in Fig. 3, the time-invariant features 𝑐 does not repeatedly go into all LSTM units, and thus a substantial amount of computational time can be saved. Regarding the LSTM unit, only time-variant features are processed. Let 𝑎, 𝑖, 𝑓 , 𝑜, and 𝑔 represent the input node, input gate, forget gate, output gate, and cell, respectively; moreover, let 𝑠̄𝑡 = [𝑠𝑡 , ℎ𝑡−1 ]. At time step 𝑡, the calculation process in an LSTM unit can be expressed by Eqs. (8)–(13); Once ℎ𝑇 is calculated, the final 𝑦̂ can be computed by Eq. (14). 𝑓𝑡 = 𝜎(𝑠̄𝑡 𝑊𝑓 + 𝑏𝑓 ),

Customer feature

(14)

where 𝑊𝑓 , 𝑊𝑖 , 𝑊𝑎 , 𝑊𝑜 , 𝑏𝑓 , 𝑏𝑖 , 𝑏𝑎 , and 𝑏𝑜 are the weights and bias terms; tanh(.) is a hyperbolic tangent function; ⊙ indicates element-wise multiplication. In such a new variant, the RNN structure is in fact utilized more economically. From a behavioral point of view, the time-invariant features have considerable stable effects on behaviors. In other words, the socio-demographics somehow define the characteristic behavior of each customer; however, they do not critically affect the exact presence and absence of each maintenance. It will be a computational burden if these features are repeatedly inputted into the model. Thus, the proposed 2L variant is practical and efficient to jointly handle time-variant lifecycle sequential behaviors and time-invariant habits as well as the predictors of socio-demographics in our case. Furthermore, similar scenarios that resemble our application can easily adopt the new model. It is easy to employ the proposed method when more detailed modeling is considered, such as considering different phases of customer. Like customers’ social-demographic features, in each phase, there are certain variables that are rather constant within its own phase, such as repair types (for a new car, there would be more maintenance than repair; while for an aged car, there would be more major repair and parts replacing). Such time-invariant variables within each phase do not need to be inputted in all circles of the RNN computation, just like how 𝑐 is treated. Therefore, we can expect a substantial amount of time can

4.1.2. Models for comparison To validate the performance of RNN-2L, six models for customer absence prediction (including four models for comparison with RNN2L) are implemented: two classical models (logistic regression and multilayer perceptron, two traditional RNNs (SRNN and LSTM), two RNN-2L models (SRNN-2L and LSTM-2L). Logistic regression (LR) and multi-layer perceptron (MLP) are also compared. To be consistent with the RNN, the MLP used in this study also has one hidden layer; moreover, the activation function is a sigmoid function. It is observed that the LR and MLP require each input, 𝑥(𝑖) , to have the same length; however, in our dataset, the length of each 2 Source code is available on https://github.com/Light-jason/Code-of-4Sshop-customer-churn-prediction, written in Matlab.

9

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405

Table 5 Descriptive statistics of each feature in warranty data and rules for generating absence records. Warranty feature

Description

Min

Max

Total cost Settle amount Cost of parts Price of parts Labor cost Discount Actual repair time Estimated repair time Time delay of repair Number of orders Profit rate

Continuous, Continuous, Continuous, Continuous, Continuous, Continuous, Continuous, Continuous, Continuous, Continuous Continuous

0 0 0 0 0 0 −8,688 −18,456 −14.7 0 0

148,500 148,500 24,142.7 40,000 138,116.8 304,503 18,456 8,687.5 16.3 56 0.97

Cumulative presence Mileage

Continuous Continuous, km

1 0

11 1,004,999

Warranty period status

Categorical

2 categories

Repair type

Categorical

15 categories

RMB RMB RMB RMB RMB RMB h h h

Table 6 Confusion matrix.

Positive Negative

Positive

Negative

True Positive False Negative

False Positive True Negative

has the shortest length. In order to , varies by 𝑡; 𝑥(𝑖) sample feature, 𝑥(𝑖) 1 𝑡−1 implement the LR and MLP, each sample’s feature should be truncated , 𝑦(𝑖) ), is into the shortest length. Specifically, a particular sample, (𝑥(𝑖) 𝑡−1 𝑡 (𝑖) (𝑖) (𝑖) translated to (𝑥𝑡−2 , 𝑦𝑡 ), where 𝑥𝑡−2,𝑡−1 is given by Eq. (15).

𝑇𝑃𝑅 =

, 𝑐 (𝑖) ], for 𝑡 ≥ 2. = [𝑤𝑎𝑟𝑟(𝑖) 𝑥(𝑖) 𝑡−1 𝑡−2,𝑡−1

Set to 0

Equal to the latest record Add a maintenance period to last record OR use linear interpolation Infer from date and mileage in absence record Treat as missing value

model, whereas each column is the actual class. Table 6 is a confusion matrix, where the true positive (TP) represents the number of positive samples correctly classified; otherwise, it is regarded as false positive (FP). Analogous to negative samples, the true negative (TN) represents the number of negative samples correctly classified, as opposed to the false negative (FN) (Powers, 2011). Based on the confusion matrix, true positive rate (TPR), false positive rate (FPR), precision, recall and F1 score, and accuracy can be calculated using Eqs. (16)–(21). In Eqs. (10) and (11), when the precision of the positive samples are calculated, the ‘‘True predicted’’ value refers to ‘‘𝑇 𝑃 ’’, the ‘‘Predicted class’’ is equal to ‘‘𝑇 𝑃 + 𝐹 𝑃 ’’, and the ‘‘Actual class’’ is equal to ‘‘𝑇 𝑃 + 𝐹 𝑁’’.

Actual class

Predicted class

Rules for generating absence records

(15) 𝐹𝑃𝑅 =

Actually, in using the truncation method of Eq. (15), only the last warranty features and customer features are utilized to make predictions.

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑟𝑒𝑐𝑎𝑙𝑙 =

4.1.3. Evaluation criteria The confusion matrix used in the binary classification problem is a 2 × 2 matrix that allows the visualization of possible classifications of all samples in the test set. Typically, each row is the predicted class of

𝐹1 = 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =

𝑇𝑃 , 𝑇𝑃 + 𝐹𝑁 𝐹𝑃 , 𝐹𝑃 + 𝑇𝑁 𝑇 𝑟𝑢𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 , 𝑃 𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑐𝑙𝑎𝑠𝑠 𝑇 𝑟𝑢𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 , 𝐴𝑐𝑡𝑢𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙 , 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 𝑇𝑃 + 𝑇𝑁 , 𝑇𝑃 + 𝐹𝑁 + 𝐹𝑃 + 𝑇𝑁

(16) (17) (18) (19) (20) (21)

Table 7 Experimental results of confusion matrix, precision, recall, F1, accuracy, and AUC. Model

Confusion matrix

Positive samples (+)

Negative samples (−)

Precision

Recall

F1

Precision

Recall

F1

Accuracy

AUC

LR

+ −

+ 91,528 27,655

− 12,883 13,757

0.877

0.768

0.819

0.332

0.516

0.404

0.722

0.700

MLP

+ −

+ 91,875 27,308

− 10,685 15,955

0.896

0.771

0.829

0.369

0.599

0.456

0.739

0.763

SRNN

+ −

0.803

0.867

0.468

0.778

0.585

0.798

0.869

+ −

− 5,914 20,726 − 5,610 21,030

0.942

SRNN-2L

+ 95,651 23,532 + 94,781 24,402

0.944

0.795

0.863

0.463

0.789

0.584

0.794

0.866

LSTM

+ −

0.838

0.885

0.508

0.751

0.606

0.822

0.874

+ −

− 6,644 19,996 − 6,304 20,336

0.938

LSTM-2L

+ 99,848 19,335 + 99,725 19,458

0.941

0.837

0.886

0.511

0.763

0.612

0.823

0.880

Optimal values in each column are written in bold and underlined; suboptimal values are written in bold only.

10

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405

(b) The more sophisticated the model, the better its performance. As listed in the last column of Table 7, the AUCs of LSTM and LSTM-2L are 0.874 and 0.880, respectively; those of the SRNN and SRNN-2L are 0.869 and 0.866, respectively. Thus, LSTM and LSTM-2L slightly outperform SRNN and SRNN-2L. (c) The comparison between RNN-2L (i.e., SRNN-2L and LSTM2L) and traditional RNN (i.e., SRNN and LSTM) shows that they practically have the same ROC curve and AUC; these are shown in Figs. 4(b) and 4(c) and listed in Table 7. More specifically, the suboptimal and optimal values in each column of Table 7 all come from RNN or RNN-2L; the values considerably approximate each other.

The ROC curve is used for visualizing, organizing, and selecting classifiers based on their performance (Fawcett, 2006). The ROC curve is a two-dimensional graph created by plotting several pairs of (FPR, TPR) obtained from various thresholds, in which the FPR serves as 𝑥-axis and the TPR as 𝑦-axis. Thus, the ROC curve can be used to find the expected pair of TPR and FPR based on different situations. Moreover, the area under the ROC curve is another commonly used criteria for model comparison and evaluation; by using the average of several trapezoidal approximations, it is easy to calculate (see Table 6). 4.1.4. Ten-fold cross validation First, a 10-fold cross validation is used for training and testing; the dataset is randomly partitioned into 10 equal-sized subsets. Thereafter, a single subset is retained for model testing, whereas the remaining nine subsets are used for model training. These training and testing processes are repeated 10 times; each subset is only used once as a testing set. Second, each model’s 10 subset testing results are gathered as the overall testing result. Finally, the evaluation criteria, i.e., confusion matrix, ROC curve, and AUC are used to evaluate their performance in the overall testing results; the amount of time consumed is only used for RNN and RNN-2L in order to prove whether or not RNN-2L is able to reduce input redundancy. It is observed that in our six models, there are certain hyperparameters, such as number of epochs (𝑛𝑒𝑝𝑜𝑐ℎ ) in the iteration, number of neurons (𝑛ℎ ) in the hidden layer, and learning rate (𝛼). Before the formal experiments, certain exploratory experiments are conducted to search for the suitable hyperparameters; it is found that when 0.01 ≤ 𝛼 ≤ 0.05 and 𝑛ℎ ≤ 30, the models converge faster as the learning rate increases, and all converge before 3000 iterations. Therefore, we finally determine the following: 𝑛𝑒𝑝𝑜𝑐ℎ = 3000, 𝑛ℎ = 30, and 𝛼 = 0.05. All experiments are implemented using MATLAB 2016 in a laptop with a 2.6-GHz i7 CPU and an 8-GB RAM.

(2) All models, including classical models and recurrent neural networks, exhibited better performance (precision, recall, and F1) on the positive samples than those on the negative; in particular, negative samples have the worst performance in terms of precision. As mentioned in Section 3.4, the ratio of positive to negative is approximately 4:1, which is class imbalance. Thus, our models are more focused on the positive samples in the training process. If it is desired to improve the performance on negative samples, certain strategies (such as undersampling, oversampling, or threshold-moving) can be used. (3) Amount of time consumed. According to and listed in Table 9, RNN-2L consumes a smaller amount of training time compared with the RNN. The t-test results of and are summarized in Table 9; the testing result is 𝛥𝑡1 and 𝛥𝑡2 are significantly different from 0 at a 95% confidence level. In other words, RNN-2L consumes a smaller amount of time for training than RNN; this is significant at a 95% confidence level. 5. Retention solutions Providing management insights based on the new proposed method is among the main objectives of this study. Based on the experimental results above, LSTM-2L exhibited the best performance; accordingly, it is considered as the best model for predicting customer churn behavior in this study. The design of the application based on the proposed model is presented as follows.

4.2. Experimental results The experimental results of the confusion matrix, precision, recall, F1, accuracy, and AUC are summarized in Table 7. The threshold used in the confusion matrix is prior to positive, i.e., the rate of customer presence in the dataset (0.82 ≈ 119,183/145,823). All ROC curves are shown in Fig. 4. The amounts of time consumed by RNN and RNN2L in the ‘‘training/testing’’ process of the 10-fold cross validation are summarized in Table 8. The main results are as follows. (1) The results of F1, accuracy, and AUC have the same pattern: LSTM-2L ≈3 LSTM > SRNN-2L ≈4 SRNN > MLP > LR. To facilitate description, only the results of AUC are given in detail:

5.1. Management related factors Although various exogenous variables have been used in the model, only a few of these are related to management, e.g., prices and repair time. Thus, for customer retention, managers can only change these factors to improve their services. Because repair time is fixed to a certain extent, an effective solution would be to offer price discounts, particularly on spare parts, which are usually highly lucrative. However, there are certain problems that have to be considered. (1) Intuitively, a higher discount should result in fewer customer churns; however, the relationship may not be linear. Therefore, a sophisticated quantification analysis is required to find the most effective discount rate. (2) Offering a high discount may retain some customers; however, it is also accompanied by a corresponding profit decline. In other words, we should not only focus on the number of customers retained, but also consider the profitability of offering a discount; the latter is most important. In this regard, two experiments are designed accordingly.

(a) Recurrent neural networks, including SRNN-2L, SRNN, LSTM-2L, and LSTM, remarkably outperform the classical models, i.e., MLP and LR. As shown in Fig. 4(a), the ROC curve of SRNN (the worst model among the recurrent neural networks) completely covers the MLP and LR. More specifically, the AUC of SRNN is 0.869, whereas the MLP is 0.763, and the LR is only 0.700. The main reason is related to Eq. (15), i.e., the feature input of the MLP or LR is a truncated sequence, but that of the recurrent neural network is the whole sequence. This means that the recurrent neural network can take into consideration more information.

5.2. Identifying potential churn customers who are discount-sensitive 3 We calculated the AUC values for each validation datasets for LSTM and LSTM-2L, respectively, and we use the AUCs of two models to perform a t test. Results suggested that their mean values are not significantly different (p value = 0.210). 4 A t-test was performed, where results suggested that their AUCs are not significantly different.

As shown in Fig. 5, the prediction model is applied twice. In the first instance, the model predicts the potential churn customers (say set 𝐶), who are expected to be absent in the next maintenance period. However, even if discounts are offered to all customers in 𝐶, it is possible that certain customers will still choose to churn. Therefore, 11

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405

Fig. 4. ROC curves of classical models, traditional RNN, and RNN-2L.

Table 8 Time consumed (s) by SRNN and SRNN-2L in ‘‘training/testing’’ process of 10-fold cross validation. 10-fold CV

SRNN

SRNN-2L

𝛥𝑡1 a

LSTM

LSTM-2L

𝛥𝑡2

1 2 3 4 5 6 7 8 9 10

1137 1170 1083 1077 1031 1023 1000 974 948 1024

941 993 1007 902 793 802 798 797 855 899

196 177 77 175 238 221 202 178 94 125

5712 5678 5602 5594 5656 5689 5642 5555 5426 5796

4223 4225 4185 4224 4241 4246 4169 4135 3963 4258

1489 1454 1417 1370 1415 1443 1473 1420 1463 1538

a 𝛥𝑡

1 is the difference between SRNN and SRNN-2L; 𝛥𝑡2 is the difference between LSTM and LSTM-2L.

Table 9 Results of 𝛥𝑡1 and 𝛥𝑡2 t-test.

𝛥𝑡1 𝛥𝑡2 a The

Mean

Std

95% confidence interval

t value

p value

Significancea

168 1448

53 47

(130,206) (1414, 1482)

10.00 98.11

3.57E−06 6.02E−15

Significant Significant

result shows that 𝛥𝑡1 and 𝛥𝑡2 are significantly different from 0 at a 95% confidence level.

it is necessary to apply the model for a second time in order to identify

5.3. When to offer discounts

from 𝐶 a subset 𝑆 composed of customers who will switch from churn As discussed above, customers typically only have a three-year loyalty. Therefore, the discount can be offered if (1) the warranty has expired, and (2) the model predicts that the customer is a potential churn but is discount-sensitive. Formally, let 𝑆𝐸 = {𝑖|𝑖 ∈

to stay if discounts are provided; in other words, these customers are discount-sensitive. With this technique, losses resulting from offering useless discounts to customer 𝑖(𝑖 ∈ 𝐶, 𝑖 ∉ 𝑆) can be avoided. 12

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405

Fig. 5. Identifying potential churn customers who are discount-sensitive using LSTM-2L.

𝑆, customer 𝑖’s warranty has expired}. For instance, if our model predicts that a customer will churn at the 𝑡th maintenance, then we can offer a discount to the customer at the (𝑡 − 1)th maintenance. Note that the discount is provided at the (𝑡 − 1)th maintenance so that a profit decline is sustained at this period; however, because the customer will be retained at the 𝑡th maintenance, there will be a profit increase at the 𝑡th maintenance. Therefore, we should compute the total profit that can be gained between the (𝑡 − 1)th and 𝑡th maintenance services to determine whether offering the discount is advantageous.

could effectively increase the number of remaining customers by 21%. Moreover, the higher the Cost/Price values, the steeper the slope of the curve; this suggests the higher sensitivity of customers to price. In such a case, the customers tend to visit the shop again when they deem that the 4S shop has a considerable cost performance. Although the foregoing discount strategy has a significant effect on retaining churn customers, it also introduces certain economic pressures if the managers adopt it for all customers. Nevertheless, finding an appropriate discount strategy is a rational method for reducing financial stress while working on retaining a considerable number of remaining customers. Another attribute that managers should consider is whether the discount strategy increased the total profit. As shown in Fig. 7, there is no evident change in the total when the Cost/Price value is adjusted to 0.6; possibly, the customers could not perceive the effect of this slight price adjustment. As the Cost/Price value is increased from 0.6 to 0.9, it is evident that the profit of the 𝑡𝑡ℎ maintenance and the total profit all have a growing trend: they increase by 34% and 3.6%, respectively. However, if the Cost/Price value is 1 (the 4S shop profit is zero), then the profit growth in the 𝑡𝑡ℎ maintenance is unable to compensate for the discount loss in the (𝑡 − 1)th maintenance; consequently, the total profit declines. This means that blindly surrendering a certain amount of the profit in the (𝑡 − 1)th maintenance cannot consistently accelerate the growth of the total profit. Briefly, finding an optimized discount rate is the key to effectively retain customers and improve management operations. Based on the predicted results above, we know that adjusting the spare part price is useful to retain customers. Compared with the current situation where no retention solution is utilized, the provision of discounts to the right customers with the Cost/Price of 0.9 can result in total savings of 85,333 RMB (approximately US$12,190). If the average monthly salary is 5,000 RMB per worker, this is equivalent to the costs of hiring 17 workers; a significant management improvement. Note that there are certain limitations in this application. First, providing all ‘‘potential churn but discount-sensitive’’ customers with an equal discount rate remains a considerably raw precision marketing strategy. Undoubtedly, some of them may be more discount-sensitive than others; thus, a customized discount plan can further enlarge the opportunity for profit increase. Second, more managerial insights can be obtained by changing other management-related variables, such as the Cost/Price for repair man-hour or repair time. Third, as summarized in Table 7, it is inevitable that each model (regardless of how high its accuracy is) will have false-negative predictions. In other words, the potential churn customers predicted by the model may include some who may not actually churn; however, it is impossible for us to identify them. Therefore, inevitably, some of the investments on discount are in fact a waste because discounts for no-churn customers are pointless.

5.4. Experiments Based on the discussions above, two experiments are conducted. (1) The number of churn and remaining number of customers are compared if discounts are not provided, or discounts are provided at different rates. Specifically, ‘‘Cost/Price’’ ratio is utilized as the indicator for discount, where ‘‘Cost’’ is the actual cost of a particular maintenance, and ‘‘Price’’ is the fee that the 4S shop charges the customer for that maintenance. Because the ‘‘Cost’’ of a particular maintenance is fixed, a lower value of ‘‘Cost/Price’’ represents a higher profit, whereas Cost/Price = 1 indicates zero profit. (2) The profits from customers provided with discounts are compared with different ‘‘Cost/Price’’ values. In particular, three profit curves are presented: profits of the (𝑡 − 1)th maintenance, the 𝑡th maintenance, and the total ((𝑡 − 1)th + 𝑡th). In reality, customers churn at different times. To facilitate computations and comparisons, in this experiment, we align the ‘‘churn’’ time and name it as ‘‘𝑡th maintenance’’, where the discount provision time is the ‘‘(𝑡 − 1)th maintenance’’. The calculation processes are shown in Eqs. (22) to (23). ∑ profit𝑁𝑜_𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡 = (price𝑖,𝑡−1 − price𝑖,𝑡−1 ) 𝑖∈𝐴

+



(price𝑖,𝑡 − price𝑖,𝑡 ),

(22)

𝑖∈(𝐴−𝐶)



profit𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡 =

(price𝑖,𝑡−1 − price𝑖,𝑡−1 )

𝑖∈(𝐴−𝑆)

+

∑ 𝑖∈𝑆

+

(

1

) − 1 cost𝑖,𝑡−1

𝑟𝑖,𝑡−1 ∑ (price𝑖,𝑡 − cost𝑖,𝑡 ),

(23)

𝑖∈(𝐴−𝐶+𝑆)

where 𝐴 is the set of all customers, 𝐶 is the set of potential churn customers; 𝑆 is the set of discount-sensitive customers, who are potential churns and whose warranties are expired; 𝑟𝑖,𝑡−1 is the ratio of cost to price, i.e., 𝑟𝑖,𝑡−1 = cost𝑖,𝑡−1 ∕price𝑖,𝑡−1 ; price𝑖,𝑡 is the fee that the 4S shop charges customer 𝑖 at time 𝑡, and cost𝑖,𝑡 is the actual cost. 5.5. Results and discussions

6. Conclusions Fig. 6 shows the variation in the number of remaining customers and churn customers at different Cost/Price values. As expected, increasing the Cost/Price value (which means less profit for the 4S shop)

In this study, the customer relationship management in the automobile repair and maintenance business is analyzed for 4S shops. In 13

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405

Fig. 6. Number of remaining and churn customers at different Cost/Price values.

Fig. 7. Profit variations with different Cost/Price values.

a customer has already churned. Therefore, a customized data cleaning procedure is proposed where the absence and presence of customers are firstly defined in this paper. With this approach, the task can be easily transformed into a classification problem. Second, absence prediction is as important as churn forecasting; managers utilize this for monthly/quarterly revenue estimation. In this regard, our model can effectively provide the forecasting results of these two behaviors. In particular, the dataset of records obtained over more than six years contains rich behavioral data. Therefore, instead of directly applying a ‘‘plug-in’’ black-box-like machine learning model, such as those used in existing CRM practices, we intend to fully exploit a customer’s lifecycle and sequential absence and presence behavior data by means of model selection and specification. In this regard, a recurrent neural network structure, which is superior in the sequential

particular, we focus on the customer churn and absence behaviors. A dataset of records obtained over a period of more than six years is employed and analyzed; it includes the lifecycles of more than 15 000 customer absence and presence behavior records. To our knowledge, this study seems to be the first to utilize such a large dataset to investigate the CRM in the 4S shop aftermarket. Unique features were discovered in the studied subjects, and a research framework is developed accordingly. The three contributions of this study are summarized as follows. First, unlike various CRM research targets where churn behaviors are explicitly defined with an identifier (e.g., a contract), our dataset does not indicate whether a customer will churn. Moreover, the duration between each visit is relatively long; thus, it will be extremely late for managers to provide retention solutions after they realize that 14

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405

References

data processes, such as natural language process and computer vision, is employed. The RNN framework is extremely suitable for representing customer lifecycle behaviors; we anticipate that its use in the CRM or consumer behavior analysis will be attempted. Third, the RNN, as one of the deep learning approaches, has a computational burden not present in simple model structures, such as logistic regression; this is one of its unwelcome factors. In this paper, the RNN-2L variant is proposed. In this variant model, the conventional RNN input mechanism can be adopted for the lifecycle and sequential absence and presence data. As for the time-invariant features, such as socio-demographics, we consider them important in identifying a customer’s habit but less important in determining the exact next time presence. Therefore, to reduce computational cost, a second input mechanism is utilized for the time-invariant features. This is particularly true in mechanisms where it is not necessary for these features to be repeatedly called and computed; accordingly, a significant amount of computational time can be saved. Comparison results suggest that the RNN structures outperform conventional models, such as logistic and MLP in terms of F1-score, AUC, and accuracy. Moreover, the proposed SRNN-2L model can still retain the superior performance of the original but with a reduced computational time, which can be attributed to the new specification. This is an obvious advantage if our model is applied to a larger dataset where customers’ lifecycles are much longer, e.g. supermarkets or e-commerce CRM. Managerial solutions for profit maximization are provided in the experiments conducted. Details, such as choosing a suitable operational factor, identifying the potential churn but discount-sensitive customers, and determining when to provide discounts, are discussed. With the suggested retention solutions, significant profit increase can be obtained in our studied 4S shop. Additional suggestions are provided to further enlarge the opportunity for management improvement. This study provides not only clarity to the CRM analysis of the lucrative automobile aftermarket, but also new perspectives for analysts regarding the utilization of ‘‘black-box-like’’ machine learning models for behavior modeling. Although we have observed the superior predictive power of various machine learning approaches, there remains a considerable opportunity for analysts to endow behavioral and business insights into the model development and specification. This can be done not only to increase its forecasting performance, but also to improve its interpretation ability; this can make modeling and its results more transparent and easier to elicit managerial insight. This paper only presents a simple first step pertaining to this contribution; future directions should include in-depth investigations, such as inside the model itself. A deeper investigation can be conducted to fully exploit and adopt all types of behavioral features, not just the two current types. From a model collection or an ensemble perspective, the development of joint models and full exploitation of the dataset and managerial insights are critical problems for both computer science and business societies.

Ballings, M., Van Den Poel, D., 2015. CRM In social media: Predicting increases in Facebook usage frequency. European J. Oper. Res. 244 (1), 248–260. Bandaru, S., Gaur, A., Deb, K., Khare, V., Chougule, R., Bandyopadhyay, P., 2015. Development, analysis and applications of a quantitative methodology for assessing customer satisfaction using evolutionary optimization. Appl. Soft Comput. J. 30, 265–278. Bose, R., 2002. Customer relationship management: key components for IT success. Ind. Manag. Data Syst. 102 (2), 89–97. Chan, C.C.H., 2008. Intelligent value-based customer segmentation method for campaign management: A case study of automobile retailer. Expert Syst. Appl. 34 (4), 2754–2762. http://dx.doi.org/10.1016/j.eswa.2007.05.043, http://www. sciencedirect.com/science/article/pii/S0957417407001686. Chiang, W.Y., 2012. To establish online shoppers? markets and rules for dynamic CRM systems: an empirical case study in Taiwan. Internet Res. 22 (5), 613–625. Chiang, W.Y., 2016. Discovering customer value for marketing systems: an empirical case study. Int. J. Prod. Res. 55 (17), 1–11. Choi, E., Schuetz, A., Stewart, W.F., Sun, J., 2016. Using recurrent neural network models for early detection of heart failure onset. J. Amer. Med. Inf. Assoc. Jamia 24 (2), 361. Coussement, K., Lessmann, S., Verstraeten, G., 2017. A comparative analysis of data preparation algorithms for customer churn prediction: A case study in the telecommunication industry. Decis. Support Syst. 95, 27–36. http://dx.doi. org/10.1016/j.dss.2016.11.007, http://www.sciencedirect.com/science/article/pii/ S0167923616302020. Coussement, K., Van den Poel, D., 2008. Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques. Expert Syst. Appl. 34 (1), 313–327. http: //dx.doi.org/10.1016/j.eswa.2006.09.038, http://www.sciencedirect.com/science/ article/pii/S0957417406002806. Fang, Y., Huang, C., Liu, L., Xue, M., 2018. Research on malicious javascript detection technology based on LSTM. IEEE Access 6, 59118–59125. Farquad, M., Ravi, V., Raju, S.B., 2014. Churn prediction using comprehensible support vector machine: An analytical CRM application. Appl. Soft Comput. 19, 31– 40. http://dx.doi.org/10.1016/j.asoc.2014.01.031, http://www.sciencedirect.com/ science/article/pii/S1568494614000507. Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27 (8), 861–874. http://dx.doi.org/10.1016/j.patrec.2005.10.010, http://www. sciencedirect.com/science/article/pii/S016786550500303X, ROC Analysis in Pattern Recognition. Gers, F., 2001. Long short-term memory in recurrent neural networks. p. 101. http: //dx.doi.org/10.5075/epfl-thesis-2366, http://infoscience.epfl.ch/record/32842. Gloor, P., Colladon, A.F., de Oliveira, J.M., Rovelli, P., 2019. Put your money where your mouth is: Using deep learning to identify consumer tribes from word usage. Int. J. Inf. Manage. http://dx.doi.org/10.1016/j.ijinfomgt.2019.03.011, http: //www.sciencedirect.com/science/article/pii/S0268401218313057. Graves, A., 2012. Supervised sequence labelling. In: Supervised Sequence Labelling with Recurrent Neural Networks. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 5–13. http://dx.doi.org/10.1007/978-3-642-24797-2_2. Graves, A., Liwicki, M., Fernndez, S., Bertolami, R., Bunke, H., Schmidhuber, J., 2009. A novel connectionist system for unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31 (5), 855–868. http://dx.doi.org/10.1109/TPAMI. 2008.137. Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9 (8), 1735–1780. Huang, B., Kechadi, M.T., Buckley, B., 2012. Customer churn prediction in telecommunications. Expert Syst. Appl. 39 (1), 1414–1425. http://dx.doi.org/ 10.1016/j.eswa.2011.08.024, http://www.sciencedirect.com/science/article/pii/ S0957417411011353. Ke, J., Zheng, H., Yang, H., Chen, X.M., 2017. Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach. Transport. Res. C 85, 591–608. http://dx.doi.org/10.1016/j.trc.2017.10.016, http: //www.sciencedirect.com/science/article/pii/S0968090X17302899. Krishna, G.J., Ravi, V., 2016. Evolutionary computing applied to customer relationship management: A survey. Eng. Appl. Artif. Intell. 56, 30–59. Larivière, B., Van den Poel, D., 2005. Predicting customer retention and profitability by using random forests and regression forests techniques. Expert Syst. Appl. 29 (2), 472–484. http://dx.doi.org/10.1016/j.eswa.2005.04.043, http:// www.sciencedirect.com/science/article/pii/S0957417405000965. Lecun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521 (7553), 436. Lu, N., Lin, H., Lu, J., Zhang, G., 2014. A customer churn prediction model in telecom industry using boosting. IEEE Trans. Ind. Inf. 10 (2), 1659–1665. Morchid, M., 2018. Parsimonious memory unit for recurrent neural networks with application to natural language processing. Neurocomputing 314, 48–64. http:// dx.doi.org/10.1016/j.neucom.2018.05.081, http://www.sciencedirect.com/science/ article/pii/S0925231218306660. Morton, J., Wheeler, T.A., Kochenderfer, M.J., 2017. Analysis of recurrent neural networks for probabilistic modeling of driver behavior. IEEE Trans. Intell. Transp. Syst. 18 (5), 1289–1298. http://dx.doi.org/10.1109/TITS.2016.2603007.

CRediT authorship contribution statement Jiawe Wang: Conceptualization, Methodology, Validation, Investigation, Software, Writing — original draft. Xinjun Lai: Conceptualization, Methodology, Validation, Investigation, Writing — original draft, Writing — review & editing, Supervision, Funding acquisition. Sheng Zhang: Software, Investigation, Writing — original draft. W.M. Wang: Methodology, Supervision, Writing — original draft, Writing — review & editing. Jianghang Chen: Conceptualization, Resources, Writing — original draft, Writing — review & editing. Acknowledgments The work described in this paper was jointly supported by the National Natural Science Foundation of China (No. 71601052), the Hundred Young Talent Project of Guangdong University of Technology (No. 220413637), and the funding of High-Level University Development for Guangdong University of Technology (No. 262511006). 15

J. Wang, X. Lai, S. Zhang et al.

Engineering Applications of Artificial Intelligence 89 (2020) 103405 Powers, D.M., 2011. Evaluation: from precision, recall and F-factor to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2, 2229–3981. Salehinejad, H., Rahnamayan, S., 2016. Customer shopping pattern prediction: A recurrent neural network approach. In: 2016 IEEE Symposium Series on Computational Intelligence, SSCI, pp. 1–6. http://dx.doi.org/10.1109/SSCI.2016.7849921. Wang, K., Lau, A., Gong, F., 2016. How savvy social shoppers are transforming e-commerce. McKinsey Digital. Wouter, B., Xun, G., Ping, W., Daniel, Z., Yang, C., Shi, X., Luo, J., Han, J., 2018. I know you’ll be back: interpretable new user clustering and churn prediction on a mobile social application. In: 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD. Zhou, Y., Huang, C., Hu, Q., Zhu, J., Tang, Y., 2018. Personalized learning full-path recommendation model based on LSTM neural networks. Inform. Sci. 444, 135– 152. http://dx.doi.org/10.1016/j.ins.2018.02.053, http://www.sciencedirect.com/ science/article/pii/S0020025518301397.

Mosavi, A.B., Afsar, A., 2018. Customer value analysis in banks using data mining and Fuzzy analytic hierarchy processes. Int. J. Inform. Technol. Decision Making 17 (3), S0219622018500104. Ngai, E., Xiu, L., Chau, D., 2009. Application of data mining techniques in customer relationship management: A literature review and classification. Expert Syst. Appl. 36 (2, Part 2), 2592–2602. http://dx.doi.org/10.1016/j.eswa.2008.02.021, http: //www.sciencedirect.com/science/article/pii/S0957417408001243. Pai, J.-C., Tu, F.-M., 2011. The acceptance and use of customer relationship management (CRM) systems: An empirical study of distribution service industry in Taiwan. Expert Syst. Appl. 38 (1), 579–584. http://dx.doi.org/10.1016/j.eswa.2010.07.005, http://www.sciencedirect.com/science/article/pii/S0957417410006226. Pavel, M.S., Schulz, H., Behnke, S., 2017. Object class segmentation of RGB-D video using recurrent convolutional neural networks. Neural Netw. Offic. J. Int. Neural Netw. Soc. 88, 105–113.

16