Exploring the sequential usage patterns of mobile Internet services based on Markov models

Exploring the sequential usage patterns of mobile Internet services based on Markov models

Electronic Commerce Research and Applications 17 (2016) 1–11 Contents lists available at ScienceDirect Electronic Commerce Research and Applications...

1MB Sizes 0 Downloads 34 Views

Electronic Commerce Research and Applications 17 (2016) 1–11

Contents lists available at ScienceDirect

Electronic Commerce Research and Applications journal homepage: www.elsevier.com/locate/ecra

Exploring the sequential usage patterns of mobile Internet services based on Markov models Xiaohang Zhang a,b,⇑, Cenyue Wang a, Zhengren Li a, Ji Zhu b, Wenhua Shi a, Qi Wang a a b

School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing 100876, China Department of Statistics, University of Michigan, Ann Arbor, MI 48109-1107, USA

a r t i c l e

i n f o

Article history: Received 14 July 2015 Received in revised form 21 February 2016 Accepted 22 February 2016 Available online 28 February 2016 Keywords: Mobile internet Behavior pattern Multi-state model Hidden Markov model Clustering

a b s t r a c t Mobile Internet has developed rapidly, and various types of mobile Internet services have changed people’s lifestyles profoundly. Consequently, there is a broad market for mobile Internet service providers. To provide better service and attract users, service providers must understand their users’ behavior patterns. This study proposes a framework to model users’ mobile online behavior based on a multi-state model and a hidden Markov model; this study also extracts typical sequential behavior patterns through clustering methods. The results of the experiments display several characteristic behavior patterns that can guide service providers in application designing, operating, and marketing. Ó 2016 Published by Elsevier B.V.

1. Introduction With the development of the 3G/4G network and an intelligent terminal industry, the quantity of mobile Internet users is increasing rapidly. According to the 2014 ICT figures released by ITU (ITU 2014), the number of mobile Internet users reached 2.3 billion, with 55% of these users in developing countries that lead mobilebroadband growth. By the end of 2014, China had 649 million Internet users and 557 million mobile Internet users, according to CNNIC survey (CNNIC 2014a). Additionally, the percentage of Chinese users accessing the Internet via mobile grew to 83.4% as of June 2014, for the first time surpassing the percentage of users (80.9%) who access the Internet via PCs (CNNIC 2014b). As the figures show, the mobile Internet market is a new rapidly growing market for online service providers. To seize mobile Internet market share, online service providers, including telecommunication operators and Internet enterprises, are committed to developing and operating various mobile Internet services. Through mobile Internet, people can send instant messages, watch videos, read online books, order takeout and so on. All types of mobile online behavior have generated a huge volume of behavior-record data, which is stored by telecommunication operators and Internet enterprises. With the development of the mobile Internet market, the behavior data volume will continue to grow ⇑ Corresponding author at: School of Economics and Management, Beijing University of Posts and Telecommunications, Beijing 100876, China. http://dx.doi.org/10.1016/j.elerap.2016.02.002 1567-4223/Ó 2016 Published by Elsevier B.V.

rapidly. Knowledge such as users’ preferences and habits is covered in those mobile online behavior records. By analyzing user behavior data, online service providers can understand mobile Internet users’ behavior patterns. Furthermore, that knowledge can help telecommunication operators plan network resources and design pricing policies; in addition, it can assist Internet service providers in developing marketing strategies and designing new services. In the mobile Internet user behavior research field, previous researchers have studied network traffic, geospatial dynamics of application usage and single service behavior from several aspects. However, most studies are based on a single service or focus on underlying technology aspects. To assist in business decisions, we need to extract knowledge from all-service records of user behavior and analyze this from a business perspective. Thus, we can develop managerial proposals regarding mobile online user behavior. This study proposes a novel framework for analyzing the sequential behavior patterns of various mobile Internet services. This framework transforms users’ behaviors into state transition processes, train a multi-state model (MSM) and a hidden Markov model (HMM) separately to describe user behavior and understand behavior patterns by clustering users and extracting the typical behavior characteristics of each cluster. Although both of the MSM and HMM originate from Markov model, their results are significantly different (see Section 3) and their combination can improve the understanding of behavior patterns.

2

X. Zhang et al. / Electronic Commerce Research and Applications 17 (2016) 1–11

The remainder of this paper is structured as follows. In Section 2, the related literature regarding Internet service usage pattern and model algorithms are thoroughly reviewed. Section 3 presents this study’s framework and process. Section 4 describes the data used in this study and the experimental results. The managerial implications of the results are discussed in Section 5, and the contributions and limitations of this study are presented in Section 6. 2. Literature This study’s objective is to propose a novel framework for analyzing the behavior patterns of various mobile Internet services. The literature review is composed of two parts: studies on mobile Internet user behavior and the related models applied in this study. The latter includes a multi-state model, a hidden Markov model, distance computation and clustering. The literature reviews are shown in Table 1. 2.1. Studies on mobile Internet user behavior There are some studies on mobile Internet user behaviors. Cheng and Sun (2012) used message, entertainment and micropayment services to segment customers with an improved segmentation model, the TFM (time, frequency, money) model. Wu and Chou (2011) developed a soft clustering method that uses a latent mixed class membership clustering approach to classify online customers based on their purchasing data across categories. Bose and Chen (2010) selected Internet usage, revenue, services and user categories as the research indicators to cluster customers. Zhao et al. (2013) exposed four aspects of the characteristics of service visit styles using a weighted user-service bipartite network model. Liu et al. (2012) studied mobile Internet streaming services from the server perspective and showed the great heterogeneity from traditional Internet streaming services in terms of hardware and software differences in mobile devices, different characteristics of mobile videos, and different user access patterns from those

in traditional Internet streaming services. Keralapura et al. (2010) studied mobile user browsing behavior by investigating behavior patterns among mobile users based on real mobile network data collected from a large 3G CSP in North America; they also proposed and developed a scalable co-clustering methodology using a novel hourglass model. Ghosh et al. (2011) examined the characteristics of traffic observed at a large number of public Wi-Fi hot-spots deployed in two large metropolitan cities in terms of arrival counts, temporal variations, connection durations, and byte counts and categorized the different venues into certain business types. Chen et al. (2012) studied user movement behavior patterns in terms of the problem of mining matching mobile access patterns based on joining mobile, movement location, and dwell time in a timestamp and service request. Shafiq et al. (2012) provided a fine-grained characterization of the geospatial dynamics of application usage in a 3G cellular data network. Zhao constructed an empirical model for Web browsing on the mobile Internet, which showed that the size of the main object and the size of the embedded object exhibited a Pareto distribution; in addition, the number of embedded objects and the session duration fitted the Weibull distribution well, and the embedded object Inter-arrival time as well as reading time followed Lognormal distribution (Zhao et al. 2011). Lu et al. (2012) proposed a framework to combine user motion patterns and mobile purchase behaviors. From the above discussion, previous mobile Internet user behavior studies consist of network traffic, geospatial dynamics of application usage and single service behavior. In the mobile Internet data set, however, user behaviors across various mobile Internet services are recorded. For telecommunication operators and Internet enterprises, studies on user behavior of various mobile Internet services and the interrelations among services, which are based on all-service data, are more helpful for understanding mobile Internet user behavior as a whole and assist in making business decisions. In the previous mobile Internet customer segmentation studies, clustering methods were often directly applied to the customers’ usage attributes, such as the

Table 1 Literature review. Components

Contents/Methods

References

Mobile Internet user behavior

Online shopping behavior Online ticketing behavior Online Entertainment and Micropayment Internet usage behavior Weighted user-service bipartite network Heterogeneity from traditional Internet streaming services Scalable co-clustering methodology using a novel hourglass model Characteristics of mobile traffic Mobile access patterns Fine-grained characterization of the geospatial dynamics of application usage Modeling Web browsing on the mobile Internet User motion and purchase behaviors Review of MSM Continuous-time Markov chains Applications of MSM in social science Introduction to hidden Markov models Applications of HMM

Wu and Chou (2011), Hong and Kim (2012) Seret et al. (2014) Cheng and Sun (2012) Bose and Chen (2010) Zhao et al. (2013) Li et al. (2012) Keralapura et al. (2010)

Multi-state model

Hidden Markov model

Distance between probability density functions

Distance between hidden Markov models Clustering methods

Euclidean space Bray–Curtis distance Kullback–Leibler divergence Bhattacharyya distance Arbitrary observation densities Model match BP metric K-means k-Medoids Hierarchical clustering

Ghosh et al. (2011) Chen et al. (2012) Shafiq et al. (2012) Zhao et al. (2011) Lu et al. (2012) Meira-Machado et al. (2008) Cox and Miller (1977) Haan (2010), Kalbfleisch and Lawless (1985) Ghahramani (2001) Och and Ney (2004), Stanke and Waack (2003), and Duong et al. (2005) Cha, 2007 Bray and Curtis, 1957 Kullback and Leibler, 1951 Bhattacharyya, 1946 Juang and Rabiner, 1985 Rabiner, 1989 Panuccio et al., 2002 Lloyd, 1982 Ng and Han, 2002 Johnson, 1967

3

X. Zhang et al. / Electronic Commerce Research and Applications 17 (2016) 1–11

frequency of usage, the money spent on services, the time preference of service access. However, this study focus on the sequential pattern of service usage, so the clustering methods are applied on the customer distance matrix that are obtained by computing the distance between transition matrices in MSM or between HMMs (see Section 3) for each pair of customers. 2.2. Studies on related models 2.2.1. Multi-state model A multi-state model (MSM) is a model for a continuous time stochastic process that allows individuals to move among a finite number of discrete states. The change of state is called a transition or event (Meira-Machado et al. 2008). A multi-state model is usually based on the Markov assumption because there is a clear advantage of an intuitive graphical understanding of the model (Hougaard 1999). The Markov assumption is that future evolution depends solely on the current state (Jackson 2011). Cox and Miller (1977) presented a thorough introduction to the theory of continuous-time Markov chains. The MSM model is a very useful means of describing a process in which an individual moves through a series of discrete states in continuous time and finding behavioral transition characteristics in the process. In this study, it can help us understand the sequential patterns of customers’ mobile Internet usages that are described by a process in which individual customer moves through many mobile Internet services. Multi-state methods have been used in various situations. In medicine, the states can describe conditions such as healthy, diseased, diseased with complication, and dead (Hougaard 1999). Multi-state Markov models are also widely used in the social sciences, particularly in the study of data that records life history events (e.g., social mobility studies) for individuals (Kalbfleisch and Lawless 1985) and labor supply (Haan 2010). 2.2.2. Hidden Markov model A hidden Markov model (HMM) is a tool for representing probability distributions over sequences of observations (Ghahramani 2001). This model is a doubly stochastic process with an underlying stochastic process that is not observable (hidden) but can only be observed through another set of stochastic processes that produce the sequence of observed symbols (Rabiner and Juang 1986). In this study, the observed symbols represent the mobile Internet services and the hidden process represents the patterns transition. The basic theory of HMM was published by Baum and his colleagues in the late 1960 s and early 1970 s (Baum et al. 1970; Baum and Petrie 1966). HMMs are especially known for their applications in temporal pattern recognition, for example, speech recognition, machine translation (Och and Ney 2004), gene prediction (Stanke and Waack 2003) and human activity recognition (Duong et al. 2005). Because HMMs are based on the discrete state sequences that are different from the continuous state sequences used in MSM model, we also adopt HMM in the study. The combination of MSM and HMM allow us to observe the sequential patterns from the views of both discrete and continuous time. 2.2.3. Distance/similarity measurement There are many methods for measuring distance/similarity between two probability density functions (pdf); these are summarized as two approaches: vector and probabilistic. A probability density function can be considered a vector, as a point in the Euclidean space (Cha 2007). Therefore, Minkowski distance can be used to measure distance between two pdfs, which includes the Euclidean distance, the Manhattan distance and the Chebyshev distance. Bray–Curtis distance is widely used in ecology (Bray and Curtis 1957) when measuring the dissimilarity between pdfs. The

mutual information of two random variables is a measure of the variables’ mutual dependence, which is also a measure of similarity between pdfs. Kullback and Leibler introduced the Kullback– Leibler divergence to measure the difference between two probability distributions, also called relative entropy or information deviation, which was non-symmetric (Kullback and Leibler 1951). The Bhattacharyya distance measured the similarity of two discrete or continuous probability distributions and measured the separability of classes in classification (Bhattacharyya 1946). The Hellinger distance was introduced by Ernst Hellinger in 1909; this obeyed the triangle inequality. There are many methods for measuring distance/similarity between two hidden Markov models. Juang and Rabiner proposed a distance measure for measuring the dissimilarity between hidden Markov models with arbitrary observation densities. The measure is based on the Kullback–Leibler number and is consistent with the re-estimation technique for hidden Markov models (Juang and Rabiner 1985). Rabiner proposed a measure that described how well models 1 and 2 matched observations generated by model 2 (Rabiner 1989). Bicego and his colleagues introduced a new method, BP metric, based on Rabiner’s method with the consideration of modeling goodness by evaluating the relative normalized difference between observations and the training likelihoods (Panuccio et al. 2002). 2.2.4. Clustering model In clustering, various methods have been studied by researchers. K-means is popular for cluster analysis in data mining. Kmeans clustering’s objective is to partition N observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster (Lloyd 1982). The k-medoids is a clustering algorithm related to the kmeans algorithm, which attempts to minimize the distance between points labeled to be in a cluster and a point designated as the center of that cluster. The most common realization of kmedoids clustering is the Partitioning Around Medoids (PAM) algorithm; in addition, Ng proposed a novel algorithm called CLARANS (Ng and Han 2002). Hierarchical clustering is a method that builds a hierarchy of clusters and presents the result in a dendrogram (Johnson 1967). This method includes two types: agglomerative and divisive. 3. Study framework In this section, we propose and demonstrate the framework for analyzing the behavior patterns on various mobile Internet services, which includes two parts: a multi-state Markov model and

Introduction to multi-state model

Introduction to hidden Markov model

Continuous behavior representation

Discrete behavior representation

Multi-state model construction

Hidden Markov model construction

Hellinger distance measure

Log-likelihood Distance measure

Clustering and extract representative users

Representative behavior analysis Fig. 1. Process of study.

4

X. Zhang et al. / Electronic Commerce Research and Applications 17 (2016) 1–11

a hidden Markov model. In each part, we first explain the model principle, the input data format, and how to model each user’s behavior sequence based on model. Second, the calculation of the distance between users based on the trained models is analyzed. Third, we introduce the cluster methods into the framework to segment users. Fig. 1 shows this study’s process. 3.1. Multi-state model 3.1.1. Fundamental of multi-state model A multi-state model describes how an individual moves between a series of states in continuous time (Jackson 2011). Suppose an individual is in state SðtÞ 2 f1; . . . ; Ng at time t. The movement on the discrete state space f1; . . . ; Ng is governed by transition intensities qrs ðt; zðtÞÞ; r; s 2 f1; . . . ; Ng which may depend on time t, or, more generally, also on a set of individual-level or time-dependent explanatory variables zðtÞ. The intensity represents the instantaneous risk of moving from state r to s–r:

qrs ðt; zðtÞÞ ¼ lim PðSðt þ DtÞ ¼ sjSðtÞ ¼ rÞ=Dt Dt!0

ð1Þ

The qrs form a N  N matrix Q whose rows sum to zero, so that P the diagonal entries are defined by qrr ¼  s–r qrs . In a timehomogeneous continuous-time Markov model, a sojourn time in state r has an exponential distribution, with a rate given by qrr (or mean sojourn time equals 1=qrr ). The remaining elements of the rth row of Q are proportional to the probabilities governing the next state after r to which the individual transfers. The probability that the individual moves from state r to s is qrs =qrr . Therefore, we can construct a N  N state transit probability matrix P, in which prs ¼ qrs =qrr ; r–s. An illustration of the process of MSM is shown in Fig. 2. When fitting a multi-state model to training data, we need to estimate the intensity matrix Q and the state transit probability matrix P. We use the Markov assumption that future evolution depends solely on the current state. Cox and Miller (1977) introduced the theory of continuous-time discrete-state Markov chains further and more comprehensively.

3.1.2. Data preparation and model construction Now, we consider what type of data can be used as training data for the MSM model in this study. Because of the smartphone’s characteristic, a user can only use one application on the screen at a particular point in time, regardless of applications running in the background, which is different from personal computers. If we utilize the usage of a certain application type as a state and also non-usage as a state, the entire smartphone usage can be counted as a continuous time stochastic process in which an individual moves among a finite number of discrete states. Fig. 3 illustrates a simple example of the process. Suppose a user is occupied with three types of applications, services A, B, and C, which are represented by states 2, 3, and 4. If the user is not using any applications, the state is 1. As time passed, the user changed from non-usage to service B, then to service A, then to service C, and then to service A. The usages of different services may last different duration. The data of a user’s state transitions are imported as training data for a multi-state model; therefore, the training result can represent the user’s behavior patterns. During the past 10 years, multi-state modeling software has advanced impressively. The statistical theory of counting processes is the preferred method for the estimation of multistate models, and R is the preferred programming platform (Willekens and Putter 2014). In this study, we apply the msm package to train models. The msm package, developed by Jackson and first published in 2002, adopts the parametric method for estimating transition rates in multistate models. The package uses the maximum likelihood method developed by Kalbfleisch and Lawless (1985b). This package accommodates the knowledge of exact times at transition, such as the data type we previously noted. We use the records of each user as an imported data set, which has three variables (Table 2). In the training settings, exacttimes are set to TRUE, which means startTimes are assumed to represent the exact times of the process transition. From the training results, we obtain rich information regarding every user’s multi-state model. As we previously noted, the mean sojourn times and the probability that the individual moves from state r to s provide more intuitive information on a continuoustime Markov model than the raw transition intensities matrices. The mean sojourn time shows information regarding separate services, and the transfer probability shows information regarding the relation among services. Considering that we focus on sequential behavior patterns, we choose transfer probabilities matrix for further study. 3.1.3. User distance measure A user’s behavior patterns are represented by the probability matrix P extracted from a multi-state model. Every row of the P matrix can be treated as a probability distribution function in an analogy method. There are many ways to compute the distance between two probability distribution functions, as noted in Section 2. We adopt the Hellinger distance (Nikulin 2001) in this study because elements in the P matrix may be 0. Non-usage (State 1)

Service B (State 2)

Service A (State 1)

Service C Service A (State 3) (State 1)

….

t

Fig. 3. A simple usage process.

Table 2 Variables of imported data for MSM. Fig. 2. An illustration of the process of MSM. The top part of the figure represents the input of MSM, in which the states last and transfer in continuous time horizon. Q is the intensity matrix in which qrs represents the instantaneous intensity of moving from state r to s–r. P is the state transfer probability matrix in which prs denotes the transfer probability from state r to s–r.

Variables

Meaning

Service startTime endTime

Service that the customer uses The moment at which the customer starts using the service The moment at which the customer stops using the service

X. Zhang et al. / Electronic Commerce Research and Applications 17 (2016) 1–11

5

Set Pi is the P matrix of user i, and there are N states totally in the data. P i is a N  N matrix, whose rows sum to 1. The Hellinger distance between user i and user j is defined by:

vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u N qffiffiffiffiffiffi qffiffiffiffiffiffi2 N u X 1 X t pisr  psrj dij ¼ pffiffiffi 2 r¼1 s¼1;s–r

ð2Þ

Thus, we can obtain the distance matrix D in which dij is the distance between user i and j. 3.1.4. Clustering On the basis of the distance matrix D, we segment users into k clusters using clustering algorithms. In this study, we apply clustering algorithms including the most common realization of the k-medoids clustering algorithm, the Partitioning Around Medoids (Theodoridis and Koutroumbas 2008) (see Appendix A), and the hierarchical clustering algorithms (Hartigan 1975). The clustering algorithms include the Ward’s minimum variance method (Ward) (Ward 1963) that aims at finding compact, spherical clusters and the complete linkage method (Complete) that adopts a ‘friends of friends’ clustering strategy. Estimating the optimal number of clusters is a major challenge in cluster analysis. In this study, we adopt Gap statistic (Tibshirani et al. 2001) to determine the number of clusters in every type of clustering algorithm (Appendix B). Gap statistic is a method for estimating the number of clusters; this is designed to be applicable to virtually any clustering method. After obtaining the clustering results, we apply the k-medoids method on each cluster to determine a representative user for each cluster, based on k = 1. In this study, we use the hclust function in the stat package of R and the pam function in the cluster package of R to implement hierarchical clustering and k-medoids clustering, respectively. The Gap statistic can be calculated through the clusGap function in the cluster package. 3.2. Hidden Markov model 3.2.1. Fundamental of hidden Markov model An HMM is a doubly stochastic process with an underlying stochastic process that is hidden, but can only be observed through another set of stochastic processes that produce the sequence of observed symbols (Rabiner and Juang 1986). A hidden process describes the transition of hidden states X t ! X tþ1 at discrete time points t ! t þ 1, X t 2 S ¼ fS1 ; S2 . . . SM g and t ¼ 1; . . . ; T. The model assumes that the hidden process states satisfy the Markov property. That means, the state X t at time t depends solely on the state X t1 at time t  1 and is independent of the states prior to t  1. The observation symbol Y t 2 O ¼ fO1 ; O2 . . . ON g at time t is emitted by state X t in the hidden process. The observations also satisfy a Markov property on the basis of the states: given X t , Y t is independent of the states and observations at all other time indices (Ghahramani 2001). Based on the observed symbol sequence Y 1 ; . . . ; Y T and a specified value of M, we can estimate (1) the initial state distribution p ¼ fp1 ; . . . ; pM g, pi denoting the probability that the hidden state is Si at time point t ¼ 0; (2) the M  M state transition probability matrix A ¼ faij ji; j ¼ 1; . . . ; Mg, aij denoting the probability that hidden state transfers from Si to Sj ; (3) the observation symbol probðiÞ

ability distribution b ðiÞ

ðiÞ

¼ fbj jj ¼ 1; . . . ; Ng for each hidden state

Si ði ¼ 1; . . . ; MÞ, bj denoting the probability that observed symbol could be Oj at hidden state Si . The process is illustrated in Fig. 4. In this study, each observation symbol Oi 2 O represents one service, and each state Si 2 S represents one usage pattern because one state corresponds to a distinct probability distribution of the

Fig. 4. An illustration of HMM model. The top part of the figure represents the input of HMM, which are the combination of hidden states and observed symbols in the discrete time horizon. The output of HMM includes the initial distribution of hidden ðiÞ states p, the state transition matrix A, and the observation symbols distribution b corresponding to each hidden state Si .

observation symbols (A sample illustration is given in Table 7) which shows the user’s service preference. 3.2.2. Data preparation and model construction In the previous multi-state model segment, we extract the service usage behavior as a continuous-time discrete-state Markov process in which states represent the types of services. Now we discretize the process to a sequence of points that represent services. Using each point as an observation, we obtain the observed symbol sequence data. After the data processing, the hidden Markov model can be applied to model users’ behavior patterns. In this study, we apply the hmm.discnp package in R to train models. The package is developed by Rolf Turner from the University of Auckland. A key challenge in training hidden Markov models is the determination of the number of hidden states, M, in the model. We adopt the Bayesian information criterion to select the best M. In statistics, the Bayesian information criterion (BIC) is a criterion for model selection among a finite set of models; the model with the lowest BIC is preferred. When fitting models, it is possible to increase the likelihood by adding parameters; however, doing so may result in overfitting. BIC resolves this problem by introducing a penalty term for the number of parameters in the model. BIC is normally defined by:

BIC ¼ 2  lnðLÞ þ f  lnðTÞ;

ð3Þ

where L means the maximized value of the likelihood function of the model, f means the number of free parameters to be estimated and T means the number of data points in training data. We use an M-states N-distinct-observations hidden Markov model and an input observation sequence Y of T points as an example. The number of free parameters in the transition probability distribution is M  ðM  1Þ, the number of free parameters in the observation symbol probability distribution is N  ðM  1Þ, and the number of free parameters in the initial state distribution is M  1. Therefore, the number of free parameters in the example model is ðN þ M þ 1Þ  ðM  1Þ. The BIC in this case is:

BIC ¼ 2  lnðLðTjkÞÞ þ ðN þ M þ 1Þ  ðM  1Þ  lnðTÞ:

ð4Þ

We organize the behavior observation sequences in a one-day cycle. According to the package manual, we use the observation sequence of each user in each day as a vector, and combine each user’s sequences as a list. The list is the training data sets for the

6

X. Zhang et al. / Electronic Commerce Research and Applications 17 (2016) 1–11

user’s hidden Markov model. From the training results, we obtain the initial state distribution p, the state transition matrix A, and the observation symbol probability distribution b.

provide the understanding of the sequential relationship of the services, and the HMM’s results give the insight of patterns’ transfer and services combination.

3.2.3. User distance measure After obtaining the HMM of each user, we measure the distance between users by calculating the dissimilarity between their HMMs. In this study, we adopt the method that Rabiner proposed in 1989, which described how well model i matched observations generated by model j and how well model j matched observations it generated. Suppose HMM model ki is trained from customer i ’s observa-

4. Experiment

ðiÞ

ðiÞ

ðiÞ

tion sequence Y ðiÞ ¼ fY 1 ; Y 2 ; . . . ; Y T i g. We describe how well ki matches observations generated by kj , relative to how well model ki matches observations it generated, as:

Dðki ; kj Þ ¼

1 ½log PðY ðjÞ jki Þ  log PðY ðjÞ jkj Þ; Ti

ð5Þ

where T i is the length of Y ðiÞ . Because the measure is nonsymmetric, a symmetric measure between model ki and model kj is:

Ds ðki ; kj Þ ¼ ðDðki ; kj Þ þ Dðkj ; ki ÞÞ=2:

ð6Þ

The distance matrix among users can be created by computing the Ds ¼ fDs ðki ; kj Þg between each pair of users. 3.2.4. Clustering As in the multi-state model, we cluster the users with the previously calculated distance matrix Ds . The applicable methods include k-medoids method and the hierarchical clustering algorithm. Additionally, the number of clusters is decided by Gap statistic. K-medoids (k ¼ 1) is used in each cluster to abstract the representative users. 3.3. Difference between MSM and HMM There are several differences between MSM and HMM. First, MSM is a continuous-time model, and HMM is a discrete-time model. Now we use an example to explain the difference when we apply the two models to mobile service usage behavior. Fig. 5 shows two processes. The transitions are the same; however, the durations are different. In MSM, different behavior sequences are extracted from the two processes because of the different state durations. However, in HMM, the two processes have the same training data, which starts from state 1, to state 3, state 2, and back to state 1. Therefore, when we focus on the transition of states rather than the duration of states, we should choose HMM. Second, MSM solely has a state sequence, and HMM contains a state sequence and an observation sequence. The patterns in MSM are represented by state (service) transition matrix; however, the patterns in HMM are represented by hidden states’ corresponding observation (service) distribution. Therefore, the MSM’s results State 3 State 2 State 1 Process 1

t

4.1. Data description In this study, the experimental data are derived from a significant telecommunications company on the Chinese mainland and is composed of the mobile Internet behavior records of 1000 customers over 28 days that cover the complete four weeks. The customers are chosen randomly, but in order to ensure the reliability and validity of the analyses, we only choose the customers who used mobile Internet services every day in the period, and have used the mobile Internet services at least one year. So the chosen customers have steady behaviors. The variables in the data set are shown in Table 3. The endTime variable cannot always be recorded accurately. If users terminate services actively (for example they shut off the explorer or power off their phones), the end time of services is accurate. If users do not terminate the services actively, end time depends on the expiration time of sessions. The valid period of a session is set up to 5 min in the system. For example, when a user has been using a service for 13 min and does not terminate it actively, the session will expire after 5 min and the system will record the duration of 18 min. With the development of smartphones and the 3G/4G network, the competition in the mobile Internet service market is becoming fiercer. Therefore, a particular type of service would include similar applications from different service providers. For instance, when people need to download new applications, they may choose Wandoujia Marketplace or Google Apps Marketplace, which can both meet their needs but originate from two competing enterprises. Because we focus on the mobile Internet services instead of on the service providers, we divide the services into 13 groups regardless of service providers, as listed in Table 4. To ensure the results are correct and effective, we remove 55 records whose value for serviceState is failure. Furthermore, few customers used the service groups Prop and Encry; therefore, the records of these two service groups are removed. To reduce the dimensions in training, we merge the Cartoon and StrMed groups into the Video group, VoIP into the IM group, and ftp into the P2P group. Finally, thirteen types of service groups remained. 4.2. Results of multi-state model We develop a multi-state model for each user using the training data. After calculating the distance matrix, we apply the three clustering methods to the distance matrix and use gap statistic (Tibs2001sEmax) to determine the number of clusters. The behavior patterns of the medoids in clusters are extracted and analyzed. The clustering results are shown in Table 5. We find that the medoids of the clusters in k-medoids method are not suitable representatives of customer behaviors. Moreover, the sizes of the clusters in the hierarchical clustering with Complete method are greatly unbalanced. Therefore, we choose the results of the hierarchical clustering (Ward) as the final results. Then, in each cluster we use the k-medoids method (k ¼ 1) to Table 3 Variables of customer mobile Internet behavior records.

State 3 State 2 State 1 Process 2 Fig. 5. Two examples of state processes.

t

Variables

Meaning

ID Anonymous Service startTime endTime

ID of each customer Service that the customer uses The moment at which the customer starts using the service The moment at which the customer stops using the service

X. Zhang et al. / Electronic Commerce Research and Applications 17 (2016) 1–11 Table 4 The Groups of Mobile Internet Services. Service

Meaning

IM Read Mblog Navi Video Music App Game Pay Cartoon Email Brow FTP StrMed P2P VOIP Prop Encry NA

Instant messaging, real-time text and picture transmission E-books that are solely available to read online Microblog, a broadcasting online communication service Online map-searching and navigation services Online videos with mobile devices, such as the NBA live Online music with mobile devices Application store visits or application downloads Online mobile games Mobile payment services for online shopping Cartoons on mobile devices Email with mobile devices Internet browsing with mobile browser Data transmission services Streaming media services on mobile devices Peer-to-Peer services Real-time voice transmission of digitized voice signals Proprietary services operated by a telecoms operator Communication services with encryption algorithms No application usage

Table 5 Clustering results of MSM. Clustering method

Number of clusters

Hierarchical clustering (Complete) Hierarchical clustering (Ward) K-medoids

7 5 2

determine its medoid, which is the representative user. The Q matrices of the five representative users are extracted and transformed into P matrices, which denote the probability of the individual’s next move. We visualize the P matrices with correlograms (Fig. 6) in which the rows represent the services that the individual moves from and the columns represent the services the individual moves to. The deeper the square color is, the higher the probability is. From the correlograms, we can find distinct behavior patterns: (1) cluster 1 favorites a few services and Game => IM <=> Video <= Mblog is the most frequent service transfer pattern. Brow is its most available starting service; (2) cluster 2 has a strong combination of IM, Navi and Mblog. The reason may be that most applications of IM and Mblog have embedded a map module that is directly connected to navigation. Moreover, Cluster 2 has an unusual strong connection between Music and Navi; (3) cluster 3 uses all of the services, and the transitions between services are more frequent. IM and Brow are the most available transfer-to services; (4) cluster 4 seldom uses navigation and pay services. IM, Music and Video are their favorite combination; (5) cluster 5 is an active user group. Brow and Video are a strong combination. Browser is the most available move-to service after every service instead of no-usage. However, these clusters still have certain features in common: (1) when users turn to data usage and begin to use mobile Internet services, Brow is the most available first service; (2) Brow and IM are the most often move-to services; (3) IM is more possible, both as a move-to service and a move-out service. This service plays a bridge role between services; (4) Mblog has a strong connection with entertainment services such as Video, Music and Game; and (5) after using email, users’ behavior differs greatly from their common behavior. Each user has his or her preferred move-to services after using Email, and barely moves to no-usage. 4.3. Results of hidden Markov model We obtain each user’s hidden Markov model using the training data. After calculating the distance matrix, we find that there are

7

two outliers that may influence the clustering results; therefore, we delete the two points (users). Then we apply three clustering methods to the distance matrix and use gap statistic (Tibs2001SEmax) to determine the number of clusters. The clustering results are shown in Table 6. The K-medoids result has three clusters, and there is excessively general clustering. Additionally, the clusters in the hierarchical clustering with Complete method are distinguishable. Therefore, we choose the hierarchical clustering (Ward) results as the final results. Then, in each cluster, we use k-medoids method (k ¼ 1) to determine the medoid, which is the representative user in the cluster. As we have previously noted, the state transition probability distribution, the observation symbol probability distribution and the initial state distribution are more decisive in a transition process. The state transition probability distributions of the six representative users share a common feature, which is that the diagonal entries are close to 1 and the others are close to 0. This illustrates that, in the mobile service usage sequence, each hidden state, which represents a typical usage pattern, tends to remain unchanged for a long time. Therefore, the probability that a hidden state keeps unchanged is close to 1. The observation symbol probability distributions and the initial state distribution are shown in Table 7. The elements whose values are below 0.01 are set to be invisible. From the results in Table 7, we can find that the clusters may have different number of patterns (hidden states) and the patterns have different service preference. For example, Cluster 2 has only two patterns and only favors IM, Video and Brow services; however, Cluster 5 has four patterns and has various service preferences. Although different clusters have different usage patterns, they still have some common characteristics: (1) Brow and no-usage (NA) are the services most often chosen in any state. Next are IM, Video, and Game; (2) in many patterns, users focus on two or three services. In a few states, users tend to use more types of services; (3) many clusters have a pattern that only focus on Brow service; and (4) users are more likely to remain in a state and not easily move to other states. 4.4. Comparison of the MSM and HMM results We have explained the results of two behavior models. Additionally, to validate the similarity of the two clustering results, the normalized mutual information (NMI) is calculated. The NMI value is equal to 1 if the clustering results of two models are identical, and has an expected value of 0 if the clusters are independent. The NMI of the clustering results of the two models is 0.179. This is smaller than 0.5, which means that the clustering results of the algorithms are significantly independent. As we have explained in Section 3.3, the two models extract behavior sequences based on different data format, continuous data for MSM and discrete data for HMM. Therefore, the independency of the two models is explainable and acceptable. In order to compare the fitness of the MSM and HMM on data, we divide the data sequence for each customer i into two parts, in ð1Þ

which the first part Di

is used to train the model M i and the sec-

ð2Þ Di

¼ fY i;1 ; Y i;2 ; . . . ; Y i;T i g is used to evaluate the preond part dictability of M i . The predictability of the HMM and MSM model can be measured by:

Pred ¼

( T ) C i 1X 1X 1fY i;t ¼Y^ i;t g ; Y^ i;t ¼ argmaxj PðSj jM i ; Y i;t1 Þ C i¼1 T i t¼1

ð7Þ

where C is the number of customers; T i denotes the number of ð2Þ

observations in Di ; 1fY i;t ¼Y^ i;t g is an indicator function and

8

X. Zhang et al. / Electronic Commerce Research and Applications 17 (2016) 1–11

Fig. 6. Five representative P matrixes.

^ i;t , otherwise 1 1fY i;t ¼Y^ i;t g ¼ 1 if Y i;t ¼ Y ^ g ¼ 0; Sj denotes the fY i;t ¼Y i;t

5. Managerial implications

mobile Internet services; PðSj jM i ; Y i;t1 Þ denotes the probability of current observation being service Sj given model Mi and last observation Y i;t1 . The results show that the predictabilities of MSM and HMM are 0.75 and 0.79 respectively, which means that MSM and HMM are not bad models and have similar prediction performance in this dataset.

Mobile Internet services have been adopted by increasing numbers of people in recent years and have already played an important role in people’s personal and work lives. Therefore, the rapid development of mobile Internet services has created various business opportunities for online service providers and communica-

9

X. Zhang et al. / Electronic Commerce Research and Applications 17 (2016) 1–11

because there are more target users in microblogs. (4) Email is a service that has a strong guidance ability. After using email, users have their own preferred services, which are far different from their normal move-to services. Because email has strong guidance ability, email direct marketing (EDM) is a useful means of marketing for companies. Accurate and targeted EDM may have a stronger response than traditional marketing methods, and it is much cheaper. (5) Users are likely to be absorbed in entertainment services, such as video and game services. The more time users spend in a service, the more business value it has. For telecommunication operators, traffic cooperation with entertainment services companies will be a suitable choice for selling more traffic. Additionally, this is an appropriate entry point for back-charging. From the clustering results based on the HMM model (Table 7), we find some typical patterns that can help to design marketing strategies. (1) Some customers have a pattern that shows diversified services preferences (S3 in Cluster 3, S3 in Cluster 4, and S4 in Cluster 5). Because of their wide range of interests, they are more likely to be interested in some new services and to become the first-generation users. If a company pushes out a new product, these customers can be the marketing targets. However, we have to pay attention to that these customers also have other patterns (e.g., S1 and S2 in Cluster 3) in which they only use a few services; therefore, we should recommend products to them at the time of diversified services pattern being active. (2) Although some customers (Cluster 6) have diversified patterns, they focus on some fixed services in all patterns. The differences of the patterns only lie on the proportions of the services. Further, we can find that their favorite services are IM and Video that are most efficient interactive marketing tool and embedded advertisements tool, respectively, to recommend some products to these customers. Moreover, since these customers are extremely dependent on IM and Video, they naturally become the important targets of customer caring for IM and Video service providers. (3) Some customers have simple patterns and simple service preferences, and spend less time on mobile Internet (Cluster 2). The telecommunication operators should cultivate their habits of using mobile Internet by giving larger discount on data traffic package or free service experiences. (4) Except the Brow service, IM is the most favorite service for most customers. We know that IM service runs on social networks that are one of the most effective carriers of word-mouth marketing. Therefore, we can encourage the customers to spread their other favorite services through IM platform to their friends by giving them extra benefits. For example, the customers in Cluster 4 can recommend their other favorite services, such as Video,

Table 6 Clustering results of HMM. Clustering method

Number of clusters

Hierarchical clustering (Complete) Hierarchical clustering (Ward) K-medoids algorithm

5 6 3

tions network providers. To achieve a win–win for both the interests of the users and the benefits of the companies, service providers and network providers must have a deep understanding of the behavior patterns of mobile Internet customers. From the representative users’ behavior based on MSM model, we can determine certain behavior patterns that can assist in service providers’ business decisions. We use several typical services as examples: (1) Instant message plays a bridge role in mobile Internet services, and it has strong connections with other services. Therefore, for IM service providers, it is essential to provide easy connections with other applications for users. Thus, IM service providers can obtain traffic and attract users from other services and strengthen user stickiness. For example, IM service providers can open their applications more for other types of applications to access, such as building open platforms and providing application program interfaces (APIs) for application developers. (2) The browser is an entrance service in mobile Internet. When people turn on data usage, they are likely to use the browser first. As an entrance service, a browser plays an important role in the mobile Internet market. In most cases, the browser is the first service people use when they access the mobile Internet. In the browser, users can obtain access to other services. Therefore, for a particular browser application, the more users adopt it, the more traffic the company obtains. For companies that have various applications, it is critical to develop browser applications to hold users and guide the traffic to their own applications. For companies that focus on vertical markets, cooperation with browsers is a suitable option to obtain traffic and users. (3) The microblog has strong entertainment properties. After using entertainment services such as music, video, and game, users tend to use a microblog. In addition, after using the microblog, users also tend to use entertainment services such as music, video, and games. The microblog is a bridge between entertainment applications. For companies that have microblog applications, introducing entertainment content is a practical direction for content because this can attract potential users. For companies that have entertainment applications, increasing advertising and marketing investment in microblogs may produce better effects

Table 7 The observation symbol probability distributions and the initial state distributions in HMM. S denotes the hidden states. Cluster 1

Cluster 2

Cluster 3

S1

S2

S1

S2

S1

S2

S3

S1

S2

S3

S4

S1

S2

S3

S4

S1

S2

S3

S4

Initial Distribution

0.95

0.05

0.78

0.22

0.57

0.40

0.03

0.72

0.15

0.06

0.05

0.43

0.26

0.09

0.06

0.69

0.18

0.08

0.04

IM Read Mblog Navi Video Music App Game Pay Email Brow FTP&P2P NA

0.10

0.24

0.08

0.08

0.22 0.02 0.05

0.17

0.14

0.11

0.01

0.03

0.07

0.05

0.11

0.16

0.44

0.07

0.04

0.06

0.10 0.01 0.02 0.03

0.32

0.03

0.19

0.06

0.07

0.02 0.03 0.04

0.12

0.09

Cluster 4

0.02

0.08 0.05 0.02 0.06

0.69

0.31

0.49

0.38

0.41

0.49

0.21

0.35

0.50

0.41

0.41

0.50

0.35 0.03 0.11

Cluster 5

0.02

0.34

0.38

0.49

0.45

0.07 0.02 0.10 0.07

0.14

0.13

0.34 0.02 0.25

0.41

0.49

0.43

0.43

0.50

0.40

Cluster 6

0.36 0.10 0.27

0.23 0.09

0.44 0.03

0.01

0.43 0.42

0.39 0.01 0.23

0.34 0.49

0.29

10

X. Zhang et al. / Electronic Commerce Research and Applications 17 (2016) 1–11

App and Game services, to their friends because of some extra awards from telecommunication operators or other service. 6. Conclusions With the rapid development of the mobile Internet, mobile service providers have enjoyed enormous business opportunities and gained generous profits in the mobile Internet market. In addition, there is fierce competition in this market. Understanding the behavior patterns of the mobile Internet users is essential for mobile service providers to seize and increase their market shares. In this study, we propose an analysis framework for discovering the sequential behavior patterns of mobile Internet users. The main contributions of this paper include the following: (1) the methods of modeling mobile Internet behaviors based on MSM and HMM are proposed; (2) typical sequential behavior patterns of the representative users are extracted and compared; and (3) the relation of the various mobile Internet services are analyzed from the perspective of usage sequences. These results are useful for mobile Internet service providers in designing applications, operating applications, developing marketing plans, and formulating development strategies. The limitation of this study is that the data set used in the experiment is based solely on the mobile Internet service usage records collected from mobile networks; therefore, it may not cover the complete usage records. In certain cases, users may have access to mobile Internet services through other means, such as a WiFi network. However, mobile networks remain the main entrance to mobile Internet services because they can be used anywhere and anytime. Moreover, both of the MSM and HMM models in this study satisfy the Markov assumption that the next state evolution depends solely on the current state, which maybe do not accord with practical scenarios in which behavior transfer may be based on multiple previous states. Applying variableorder markov chains to mobile Internet behaviors will become our future study. Despite these limitations, we believe that this study can provide insights into the sequential behavior patterns of mobile Internet services with a new proposed framework. Acknowledgements This work was partially supported by NSFC (71371034 and 71372194), the Program for NCET (NCET-13-0687), the Youth Research and Innovation Program in Beijing University of Posts and Telecommunications (2014ZD02), the National Basic Research Program of China (2012CB315805), and Humanity and Social Science Foundation in Ministry of Education (13YJA630084). Appendix A. The PAM algorithm In this study, we use k-medoid clustering method to segment n users into k clusters. The most common realization of k-medoid clustering is the Partitioning Around Medoids (PAM) algorithm (Theodoridis and Koutroumbas 2008) and is described in Algorithm 1. Algorithm 1. The PAM algorithm. Initialize: randomly select (without replacement) k of the n observations (users in this study) as the medoids Associate each observation to the closest medoid. While the cost of the configuration decreases: For each medoid m, for each non-medoid observation o: Swap m and o, recompute the cost (sum of distances of observations to their medoid) If the total cost of the configuration increased in the previous step, undo the swap

Appendix B. The GAP statistic The Gap statistic (Tibshirani et al. 2001) is used to determine the number of clusters in the process of clustering. The method proceeds as follows. Our data consists of n observations (users in this study) and let dij denote the distance between observations i and j. Suppose we have clustered the data into k clusters C1, C2, . . ., Ck, with C r denoting the indices of observations in r, and nr denoting the number of P observations in cluster C r . Let Dr ¼ i;j2C r dij be the sum of the pairwise distances for all observations in cluster r, and set P W k ¼ kr¼1 ð1=2nr ÞDr as within-dispersion measure. The Gap statistic is computed as follows. Step 1: cluster the observed data, varying the total number of clusters from k ¼ 1; 2; . . . ; K, giving within-dispersion measures Wk, K = 1, 2, . . .K. Step 2: generate B reference data sets, using the uniform distribution, and cluster each one giving within-dispersion measure W kb ; b ¼ 1; 2; . . . ; B; k ¼ 1; 2; . . . ; K. Compute the estimated gap statistic

GapðkÞ ¼ ð1=BÞ

X logðW kb Þ  logðW k Þ:

ðB:1Þ

b

Step 3: deviation:

P let l ¼ ð1=BÞ b logðW kb Þ,

compute

the

standard

X 2 1=2 sdk ¼ fð1=BÞ ½logðW kb Þ  l g ; b

and define sk ¼ sdk =ð1 þ 1=BÞ. Finally choose the number of clusters via

^ ¼ smallest k such that GapðkÞ P Gapðk þ 1Þ  s : k kþ1

ðB:2Þ

References Baum, L.E., Petrie, T., 1966. Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics, 1554–1563. Baum, L.E., Petrie, T., Soules, G., Weiss, N., 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 164–171. Bhattacharyya, A., 1946. On a measure of divergence between two multinomial populations: Sankhy. The Indian Journal of Statistics, 401–406. Bose, I., Chen, X., 2010. Exploring business opportunities from mobile services data of customers: an inter-cluster analysis approach. Electronic Commerce Research and Applications 9, 197–208. Bray, J.R., Curtis, J.T., 1957. An ordination of the upland forest communities of southern Wisconsin. Ecological Monographs 27, 325–349. Cha, S., 2007. Comprehensive survey on distance/similarity measures between probability density functions. City 1, 1. Chen, T., Chou, Y., Chen, T., 2012. Mining user movement behavior patterns in a mobile service environment. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans 42, 87–101. Cheng, L.-C., Sun, L.-M., 2012. Exploring consumer adoption of new services by analyzing the behavior of 3G subscribers: an empirical case study. Electronic Commerce Research and Applications 11, 89–100. CNNIC, 2014a, . CNNIC, 2014b, . Cox, D.R., Miller, H.D., 1977. The Theory of Stochastic Processes, vol. 134. CRC Press. Duong, T.V., Bui, H.H., Phung, D.Q., Venkatesh, S., 2005. Activity recognition and abnormality detection with the switching hidden semi-markov model. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 838–845. Ghahramani, Z., 2001. An introduction to hidden Markov models and Bayesian networks: international. Journal of Pattern Recognition and Artificial Intelligence 15, 9–42. Ghosh, A., Jana, R., Ramaswami, V., Rowland, J., Shankaranarayanan, N.K., 2011. Modeling and characterization of large-scale Wi-Fi traffic in public hot-spots. Proceedings of the INFOCOM, 2921–2929. Haan, P., 2010. A multi-state model of state dependence in labor supply. Intertemporal labor supply effects of a shift from joint to individual taxation. Labour Economics 17, 323–335. Hartigan, J.A., 1975. Clustering Algorithms. Wiley, New York.

X. Zhang et al. / Electronic Commerce Research and Applications 17 (2016) 1–11 Hong, T., Kim, E., 2012. Segmenting customers in online stores based on factors that affect the customer’s intention to purchase. Expert Systems with Applications 39, 2127–2131. Hougaard, P., 1999. Multi-state models: a review. Lifetime Data Analysis 5, 239– 264. ITU, 2014. . Jackson, C.H., 2011. Multi-state modelling with R: the msm package. Journal of Statistical Software 38, 1–28. Johnson, S.C., 1967. Hierarchical clustering schemes. Psychometrika 32, 241–254. Juang, B., Rabiner, L.R., 1985. A probabilistic distance measure for hidden Markov models. AT&T Technical Journal 64, 391–408. Kalbfleisch, J.D., Lawless, J.F., 1985. The analysis of panel data under a Markov assumption. Journal of the American Statistical Association 80, 863–871. Keralapura, R., Nucci, A., Zhang, Z., Gao, L., 2010. Profiling users in a 3g network using hourglass co-clustering. Proceedings of the MobiCom, 341–352. Kullback, S., Leibler, R.A., 1951. On information and sufficiency. The Annals of Mathematical Statistics, 79–86. Liu, Y., Li, F., Guo, L., Shen, B., Chen, S., 2012. A server’s perspective of Internet streaming delivery to mobile devices. Proceedings of the INFOCOM, 1332–1340. Lloyd, S., 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129–137. Lu, E., Lee, W., Tseng, V., 2012. A framework for personal mobile commerce pattern mining and prediction. IEEE Transaction on Knowledge and Data Engineering 24 (5), 769–782. Meira-Machado, L.F., de U N A- A Lvarez, J., Cadarso-Su A Rez, C., Andersen., P., 2008. Multi-state models for the analysis of time-to-event data. Statistical Methods in Medical Research 18, 195–222. Ng, R.T., Han, J., 2002. Clarans: a method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering 14, 1003–1016. Nikulin, M.S., 2001, Hellinger Distance, Encyclopedia of Mathematics.

11

Och, F.J., Ney, H., 2004. The alignment template approach to statistical machine translation. Computational Linguistics 30, 417–449. Panuccio, A., M. Bicego, and V. Murino, 2002, A Hidden Markov Model-based Approach to Sequential Data Clustering, Structural, Syntactic, and Statistical Pattern Recognition, Springer, p. 734–743. Rabiner, L., 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–286. Rabiner, L., Juang, B., 1986. An introduction to hidden Markov models. IEEE ASSP Magazine 3, 4–16. Seret, A., Broucke, S., Vanthienen, J., 2014. A dynamic understanding of customer behavior processes based on clustering and sequence mining. Expert Systems with Applications 4, 4648–4657. Shafiq, M.Z., Ji, L., Liu, A.X., Pang, J., Wang, J., 2012. Characterizing geospatial dynamics of application usage in a 3G cellular data network. Proceedings of the INFOCOM, 1341–1349. Stanke, M., Waack, S., 2003. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19, 215–225. Theodoridis, S., Koutroumbas, K., 2008. Pattern Recognition, fourth ed. Elsevier, San Diego. Tibshirani, R., Walther, G., Hastie, T., 2001. Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63, 411–423. Ward Jr., J.H., 1963. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244. Willekens, F., Putter, H., 2014, Software for Multistate Analysis. Wu, R.-S., Chou, P.-H., 2011. Customer segmentation of multiple category data in ecommerce using a soft-clustering approach. Electronic Commerce Research and Applications 10, 331–341. Zhao, G., Shan, Q., Xiao, S., Xu, C., 2011. Modeling web browsing on mobile internet. IEEE Communications Letters 15, 1081–1083. Zhao, G., Lai, W., Xu, C., Tang, H., 2013. Revealing service visit characteristics in mobile Internet. Chinese Journal of Computers 36, 1388–1398.