Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data

Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data

ARTICLE IN PRESS JID: EOR [m5G;October 13, 2014;17:32] European Journal of Operational Research 000 (2014) 1–13 Contents lists available at Scienc...

626KB Sizes 0 Downloads 13 Views

ARTICLE IN PRESS

JID: EOR

[m5G;October 13, 2014;17:32]

European Journal of Operational Research 000 (2014) 1–13

Contents lists available at ScienceDirect

European Journal of Operational Research journal homepage: www.elsevier.com/locate/ejor

Stochastics and Statistics

Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data Zhen-Yu Chen a,∗, Zhi-Ping Fan a,b, Minghe Sun c a

Department of Information Management and Decision Sciences, School of Business Administration, Northeastern University, Shenyang 110819, China State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, China c Department of Management Science and Statistics, College of Business, The University of Texas at San Antonio, San Antonio, TX 78249-0632, USA b

a r t i c l e

i n f o

Article history: Received 30 April 2013 Accepted 11 September 2014 Available online xxx Keywords: Data mining Direct marketing Response modeling Social media Engagement behavior

a b s t r a c t With the rapid development of Web 2.0 applications, social media have increasingly become a major factor influencing the purchase decisions of customers. Longitudinal individual and engagement behavioral data generated on social media sites post challenges to integrate diverse heterogeneous data to improve prediction performance in customer response modeling. In this study, a hierarchical ensemble learning framework is proposed for behavior-aware user response modeling using diverse heterogeneous data. In the framework, a general-purpose data transformation and feature extraction strategy is developed to transform the heterogeneous high-dimensional multi-relational datasets into customer-centered high-order tensors and to extract attributes. An improved hierarchical multiple kernel support vector machine (H-MK-SVM) is developed to integrate the external, tag and keyword, individual behavioral and engagement behavioral data for feature selection from multiple correlated attributes and for ensemble learning in user response modeling. The subagging strategy is adopted to deal with large-scale imbalanced datasets. Computational experiments using a real-world microblog database were conducted to investigate the benefits of integrating diverse heterogeneous data. Computational results show that the improved H-MK-SVM using longitudinal individual behavioral data exhibits superior performance over some commonly used methods using aggregated behavioral data and the improved H-MK-SVM using engagement behavioral data performs better than using only the external and individual behavioral data. © 2014 Elsevier B.V. All rights reserved.

1. Introduction Mass marketing and direct marketing are two commonly used approaches for product (service) advertising and promotional activities (Bose & Chen, 2009). For direct marketing, a marketing message is delivered to target customers without an intermediary person or indirect media involved (Bose & Chen, 2009). Customer response modeling aims at identifying the target customers who will respond to a specific marketing campaign from the existing customer base (Cui, Wong, & Zhang, 2010; Kang, Cho, & MacLachlan, 2012). With more and more companies adopting direct marketing, customer response modeling has become one of the most effective direct marketing strategies to increase total revenue and decrease marketing cost (Cui, Wong, & Lui, 2006; Kang et al., 2012; Lee, Shin, Hwang, Cho, & MacLachlan, 2010). Because the purpose is to identify customers as possible respondents and non-respondents to a specific marketing campaign



Corresponding author. Tel.: +86 24 83671630. E-mail address: [email protected] (Z.-Y. Chen)

(Bose & Chen, 2009; Lee et al., 2010), customer response modeling is a binary classification problem. For customer response modeling, external and behavioral data are usually used (Bose & Chen, 2009). Customer demographic, geographic and lifestyle data are often obtained from external data vendors (Baecke & Van den Poel, 2011), and thus are called external data. Customer behavioral data including transaction records, feedbacks to marketers, customer reviews and Web browsing records are considered to be the most important data in customer response modeling (Bose & Chen, 2009). Many supervised and semi-supervised machine learning techniques have been proposed for the customer response modeling problem (Lessmann & Voß, 2008). These techniques include artificial neural networks (ANN) (Crone, Lessmann, & Stahlbock, 2006; Kim, Street, Russell, & Menczer, 2005), decision trees (Crone et al., 2006), Bayesian networks (Baesens, Viaene, Van den Poel, Vanthienen, & Dedene, 2002; Cui et al., 2006), logistic regression (Kang et al., 2012), bagging (Ha, Cho, & MacLachlan, 2005), support vector machines (SVM) (Crone et al., 2006; Kang et al., 2012; Lessmann & Voß, 2009) and transductive SVMs (Lee et al., 2010). Moreover, some other techniques including clustering (Kang et al., 2012), sampling (Crone et al., 2006; Kang et al., 2012), sequential

http://dx.doi.org/10.1016/j.ejor.2014.09.008 0377-2217/© 2014 Elsevier B.V. All rights reserved.

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008

ARTICLE IN PRESS

JID: EOR 2

[m5G;October 13, 2014;17:32]

Z.-Y. Chen et al. / European Journal of Operational Research 000 (2014) 1–13

Longitudinal behavioral data

Longitudinal individual behavioral data

Longitudinal engagement behavioral data

Aggregated behavioral data

Aggregated individual behavioral data

Aggregated engagement behavioral data

Individual behavioral data

Engagement behavioral data

Fig. 1. Different types of behavioral data.

pattern discovery (Chen, Hsu, & Hsu, 2011), feature selection (Cui et al., 2010) and other preprocessing methods (Crone et al., 2006) have been combined with classification techniques to refine the customer base and improve prediction performance. In the age of Web 2.0, social media sites develop rapidly. Social media refers to a group of online applications allowing the creation and exchange of user-generated contents (Kaplan & Haenlein, 2010). The most popular types of social media include wikis, blogs, microblogs, social networks, video and photo sharing and online communities. They become popular communication tools due in part to the open access of the Internet, the popularity of mobile devices, the availability of the tools and the fast social interactions among users. Social media have increasingly become a major factor influencing the opinions, attitudes and the purchase behavior of customers (Mangold & Faulds, 2009). User behavioral data generated and collected on social media sites include two categories, i.e., individual behavioral data and engagement behavioral data. Moreover, according to the ways of using the behavioral data in the customer response models, user behavioral data can be classified as longitudinal behavioral data and aggregated behavioral data. Fig. 1 illustrates the different types of behavioral data. For traditional customer response modeling, the longitudinal individual behavioral variables derived from the transactional databases are usually transformed into the aggregated variables such as recency, frequency and monetary (RFM) variables which have been included in most direct marketing datasets and adopted in most response models (Baesens et al., 2002; Crone et al., 2006; Cui et al., 2010). In comparison with individual behavior, customer engagement behavior, as an emerging concept, focuses on the customers’ behavioral manifestation beyond purchase such as electronic word-of-mouth, customer–customer interaction, recommendations, blogging and online reviews (van Doorn et al., 2010). In social media, customer engagement behavior has great effect on the individual purchase decisions (Cheung & Thadani, 2012; Dellarocas, 2003; van Doorn et al., 2010). For example, Dell gained high income by posting offers to its followers on Twitter (Li & Li, 2013). A survey showed that 91 percent of respondents said that they consulted online reviews before purchasing, and 46 percent of respondents believed that the online reviews influenced their purchase decisions (Cheung & Thadani, 2012). Therefore, incorporating engagement behavioral data into customer analytical models is increasingly recognized as a new direction of customer relationship management and direct marketing (Bijmolt et al., 2010). The aggregated individual behavioral attributes are usually used as predictors in most customer response models. Few existing studies of customer response modeling pay attention to the longitudinal individual and engagement behavioral data which are widely available

in the social media databases. In recent years, the analysis of engagement behavior has been used widely in the areas of recommendation and customer churn prediction. Some researchers combined the extended factorization model with other methods such as additive forest, logistic regression and scorecard model to predict the top-N items the customer was most likely to follow using the aggregated customer–customer interaction data (Chen, Liu, et al., 2012; Chen, Tang, et al., 2012; Zhao, 2012). The information of individual customers and a group of customers which have similar characteristics was used in a novel customer profile model for product recommendation (Park & Chang, 2009). For customer relationship management of the telecommunication industry, the customer–customer interaction data have been recognized as important complements to traditional behavioral data. The aggregated engagement behavioral attributes were combined with traditional attributes to predict customer churn (Zhang, Zhu, Xu, & Wan, 2012). Some researchers recognized that customer purchase behavior varies over time and the use of the longitudinal individual behavioral data can improve prediction performance (Chen, Fan, & Sun, 2012; Liu, Lai, & Lee, 2009). Sequential pattern analysis was combined with collaborative filtering for temporal purchase behavioral data to improve recommendation performance (Cho, Cho, & Kim, 2005; Choi, Yoo, Kim, & Suh, 2012; Huang & Huang, 2009; Liu et al., 2009; Min & Han, 2005). Prinzie and Van den Poel (2006, 2007,2011) incorporated customer purchase sequence into dynamic Bayesian networks and Markov models to predict the next product for a customer to buy. Ballings and Van den Poel (2012) studied the problem of how long the customer historical data should be for customer churn prediction. They suggested that selecting a good length of historical data can decrease computational burden. For social media, the term Item may represent a specific user, organization, product (service) or event. Examples of events include the appearance of a new term or keyword, the announcement of a new product (service) or activity, or a new price of an existing product (service). The rich behavioral data generated on social media sites can be used for managers to predict user responses to an Item, make marketing policies and allocate marketing resources to influence customer behavior (Power & Phillips-Wren, 2011). For social media, customer response modeling is also called user response modeling, and the two terms are used interchangeably. In this study, customer response modeling taking into consideration of user behavioral, e.g., longitudinal individual and engagement behavioral, data is called behavior-aware user response modeling. However, the large, diverse and heterogeneous data generated on social media sites bring great challenges on behavior-aware user response modeling (Bijmolt et al., 2010; Cao, Ou, & Yu, 2012; Chau & Xu, 2012). How to deal with diverse heterogeneous data is a challenge. A variety of methods can be used for customer response modeling using external and aggregated individual behavioral data. However, to the best of our knowledge, this study is the first attempt of combining the individual behavioral and the engagement behavioral data, as well as the longitudinal and the external data for user response modeling in social media. How to deal with large amount of data is another challenge. Social media sites produce large amount of user data. For example, the daily volume of posts mentioning some well-known brands or products such as Google, Microsoft, Sony, iPhone and iPad in Twitter is in the millions (Li & Li, 2013). It is necessary to use marketing intelligence methods to automatically analyze the massive amount of data. The analysis of the massive amount of data requires efficient preprocessing of the data and excellent scalability of the customer response models. In this study, a hierarchical ensemble learning framework is developed for behavior-aware user response modeling in social media. In the framework, a general-purpose data preprocessing strategy is proposed to transform the large-scale and multi-relational user datasets

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008

ARTICLE IN PRESS

JID: EOR

[m5G;October 13, 2014;17:32]

Z.-Y. Chen et al. / European Journal of Operational Research 000 (2014) 1–13

derived from social media sites into high-order tensors and to extract attributes as input of the models. An improved hierarchical multiple kernel SVM (H-MK-SVM), as an extension of the SVM and the multiple kernel SVM (MK-SVM), is developed to model diverse heterogeneous data including external, tag and keyword, individual behavioral and engagement behavioral data. Because of the multi-relations of the individual behavioral and engagement behavioral data, one advantage of the improved H-MK-SVM is its ability in adaptively selecting associated attributes. Another advantage of this method is its ability in integrating the diverse heterogeneous social media data into a unified ensemble classifier to improve the prediction performance. Furthermore, the subagging strategy (Paleologo, Elisseeff, & Antonini, 2010) is adopted to deal with large imbalanced datasets and ensemble methods are used to combine the results of subagging. This paper is organized as follows. The next section presents the hierarchical ensemble learning framework for behavior-aware user response modeling in social media. Section 3 describes the database used in the study and presents the data preprocessing strategy. The model formulation of the improved H-MK-SVM is presented in Section 4. The computational results are reported in Section 5. Conclusions and directions for future research are given in Section 6. 2. The hierarchical ensemble learning framework In this section, diverse heterogeneous data used for user response modeling in social media are discussed. A hierarchical ensemble learning framework is then proposed for user response modeling in social media using the diverse heterogeneous data. 2.1. Diverse heterogeneous data in social media User response modeling in social media involves diverse heterogeneous data. In general, two categories of data, i.e., external data and behavioral data (Bose & Chen, 2009), are used for customer response modeling. The external data include the demographic, lifestyle and geographic data of the customers (Bose & Chen, 2009). For social media, tags and keywords make up another type of external data. A tag is a word, sign or image selected by a user as his/her descriptions and a keyword is a word with special meaning extracted from the contents of a media site such as a tweet, a retweet and comments generated by users. Tags and keywords are usually used for the descriptions of users’ interests (Chen, Liu, et al., 2012). In comparison with the external data, the behavioral data are more diverse and informative. As shown in Fig. 1, the behavioral data can be grouped into individual behavioral data and engagement behavioral data, and can also be grouped into aggregated behavioral data and longitudinal behavioral data. The aggregated individual behavioral data are usually used in more traditional user response models. The RFM and historical records of user responses are the commonly used behavioral attributes. With the rapid development of social media, firms can easily collect large amount of longitudinal engagement behavioral data. These informative and valuable data have the potential to significantly improve the prediction performance of response models. For the external data and the behavioral data, each customer is treated as an observation and n is used to represent the number of observations in the dataset. A customer is a respondent if the customer responds to an Item or takes action after an event, such as a specific marketing campaign, or a non-respondent otherwise. In the binary classification problem of customer response modeling, customer i is assigned the class label yi = 1 if the customer is a respondent or yi = −1 if the customer is a non-respondent. The value of yi is the desired output of the models for observation i. External data can be represented by a matrix S in which each row represents an observation and each column represents a static variable. In a dataset with m1 variables, the attributes of a customer i is

3

usually represented by a vector si = {sij |j = 1, . . . , m1 }. Different from numerical data, tags and keywords are usually described as textual or symbolic data and represented by a matrix Sˆ . A vector sˆi = {sˆijˆ|jˆ = 1, . . . , m2 } is used to represent the tags and keywords of a customer i where m2 is the number of tags and keywords. All standard data mining tasks, including classification, regression and clustering, and the corresponding data mining methods require the input data to be organized as a rectangular matrix. However, the longitudinal individual behavioral data are described as customercentered multivariate time series of fixed length (Chen, Fan, et al., 2012). A tensor is a multi-dimensional array which can be considered as the generalization of vectors and matrices. A first-order tensor is a vector, a second-order tensor is a matrix, and a tensor with three or higher orders is called a high-order tensor (Kolda & Bader, 2009). Therefore, the longitudinal individual behavioral data can be represented by a third-order tensor B = {Bi |i = 1, . . . , n}. The input of each customer i is represented by a rectangular matrix Bi = {bijt˜ |j˜ = 1, . . . , m3 ; t = 1, . . . , T } where m3 represents the number of longitudinal individual behavioral attributes and T represents the number of time points in each longitudinal behavioral variable. In social media, social links among users carry informative information for user response modeling. For example, because of the social links between users A and B, incorporating the behavioral data of user B into the response models may improve the prediction performance of user A. In this study, the engagement behavioral data are defined as the customer-centered behavioral data of a fixed number of followees of a customer. As shown in Fig. 1, the engagement behavioral data can be longitudinal or aggregated. In this study, only the longitudinal engagement behavioral data are used. The longitudinal engagement behavioral data can be represented by a fourth-order tensor Bˆ = {Bˆi |i = 1, . . . , n}. The input of each customer i is represented ˆ f = by a third-order tensor Bˆi = {bˆ ij tˆf |j = 1, . . . , m4 ; tˆ = 1, . . . , T; 1, . . . , N} where m4 represents the number of longitudinal individual behavior attributes of each followee f , Tˆ represents the number of time points in each longitudinal engagement behavioral variable and N represents the number of followees. Each input of a followee f , as longitudinal individual behavioral data, can be represented by a thirdorder tensor Bˆf = {bˆ ij tˆf |i = 1, . . . , n; j = 1, . . . , m4 ; tˆ = 1, . . . , Tˆ }. These four types of data are illustrated in Fig. 2. Dealing with the heterogeneous and high-order tensor data is an essential problem in user response modeling and is discussed in the next sub-section. 2.2. The hierarchical ensemble learning framework Targeting potential customers using large, diverse and heterogeneous data generated on social media sites is a difficult task. Three difficult issues need be addressed: (1) identifying the most useful data and generating the customer-centered individual and engagement behavioral datasets; (2) selecting associated attributes from coupled individual and engagement behavioral data; (3) integrating the diverse heterogeneous social media data into a classification model to predict user responses. A hierarchical ensemble learning framework, as illustrated in Fig. 3, is proposed for user response modeling using external, tag and keyword, longitudinal individual behavioral and longitudinal engagement behavioral data. The framework can be organized into three layers. In Layer 1, the original datasets are transformed into customercentered external, tag and keyword, longitudinal individual and longitudinal engagement behavioral datasets, and the subagging method is used to divide the large training sets into multiple small subsets. In Layer 2, features are selected from the longitudinal individual and engagement behavioral data. In Layer 3, multiple ensemble classifiers are trained on training subsets using the four types of data and the results of these ensemble classifiers are combined. The major tasks of the hierarchical ensemble learning framework are described in the following.

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008

ARTICLE IN PRESS

JID: EOR 4

[m5G;October 13, 2014;17:32]

Z.-Y. Chen et al. / European Journal of Operational Research 000 (2014) 1–13

Customer 1

Attribute 4.2 Attribute 4.1

Customer 2

External data

Customer 3

Followee 1.1

Customer 4

Followee 2.1

Attribute 1.1 1.2 1.3 1.4

Followee 3.1 Followee 4.1

Attribute 3.2 Attribute 3.1 Longitudinal individual behavioral data

t2

t1

t3 t4 Longitudinal Engagement behavioral data

Customer 1 Customer 2 Customer 3 Customer 4 t1

t2

Attribute 4.2 Attribute 4.1

t3 t4

Followee 1.n

Customer 1

Followee 2.n Customer 2

Tag and keyword data

Followee 3.n

Customer 3

Followee 4.n

Customer 4

t1

t2

t3 t4

Attribute 2.1 2.2 2.3 2.4 Fig. 2. Illustration of diverse heterogeneous user data.

Layer 1

Layer 2

Layer 3

Historical data

Data transformation

Filter

Individual behavioral data

Data transformation

Feature extraction

Engagement behavioral data

Multiple kernel

SNS data

External data

Data preprocessing

External data

Single kernel

Filter

Multiple kernel

K1

K

Tags and key words

Data preprocessing

Tags and keywords

String kernel

Filter

Original data

Multiple kernel

K3

Associate feature selection

K4

Ensemble learning

K2 Output

Transformed data Fig. 3. The proposed hierarchical ensemble learning framework for behavior-aware user response modeling in social media.

2.2.1. Data transformation, feature extraction and subagging In Layer 1 of the proposed hierarchical ensemble learning framework, the original large-scale and multi-relational datasets are preprocessed to generate relatively small-size datasets and are transformed into customer-centered datasets including external data S and Sˆ , longitudinal individual behavioral data B and customer-centered social network data. Longitudinal individual behavioral data B and social network data are simultaneously used for feature extraction ˆ The subagging to obtain longitudinal engagement behavioral data B. method is then used to generate small-size balanced training datasets. The details of data transformation, feature extraction and subagging are discussed using a real database in Section 3. 2.2.2. Associated attribute selection The customer–customer interactions make the individual and engagement behavioral data coupled with each other (Cao et al., 2012).

It is difficult to analyze and model the coupled behavioral data partially because of the multi-correlation among the large amount of longitudinal behavioral attributes. Associated attribute selection, as an important task in Layer 2 of the proposed hierarchical ensemble learning framework, is an effective method to reduce the redundant attributes to improve prediction performance (Buckinx, Moons, Van den Poel, & Wets, 2004; Crone et al., 2006). For this task, a sparse modeling method is adopted to learn the weights of the longitudinal behavioral attributes, and the attributes with non-zero weights are kept as associated attributes. 2.2.3. Ensemble learning In Layer 3 of the hierarchical ensemble learning framework, different types of kernels are adopted to model the external, tag and keyword, longitudinal individual behavioral and longitudinal engagement behavioral data. An ensemble classifier is developed to combine

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008

ARTICLE IN PRESS

JID: EOR

[m5G;October 13, 2014;17:32]

Z.-Y. Chen et al. / European Journal of Operational Research 000 (2014) 1–13

5

Table 1 Characteristics of the original database. Datasets

Records

Customer-centered

Attributes

Numerical

String

Rec-log-training User Profile Item User SNS

73,209,277 2,320,895 6,095 50,655,143

No Yes No No

User ID, Item ID, Results, Timestamp User ID, Year of Birth, Gender, No. of Tweets Item ID, Category, Keywords User ID, Followee ID

Yes Yes No No

No Yes Yes No

these types of data using the weights of the longitudinal behavioral attributes obtained by associated attribute selection. Furthermore, the results of these ensemble classifiers on local training subsets are combined to achieve better performance. Most existing classifiers cannot combine the above mentioned four types of data to model customer responses. Therefore, a hierarchical ensemble learning method, the improved H-MK-SVM based on the work of Chen, Fan, et al. (2012), is developed for associated attribute selection and ensemble learning. 3. The data In this section, the database used for the computational experiments is introduced first. The data transformation and feature extraction strategies for heterogeneous high-dimensional and multi-relational datasets are then described in detail using this database. The subagging strategy is then discussed. 3.1. The database A real-world database provided by Tencent Weibo1 is used in the computational experiments. Microblogs, as mainstream social media, become a new marketing platform of electronic word-of-mouth (Li & Li, 2013). Tencent Weibo is one of the largest microblog websites in China. In the database, four datasets including Rec-log-training, User Profile, Item and User SNS were used in the following experiments. The characteristics of the database are given in Table 1. The Rec-log-training dataset contains 73,209,277 historical records about users’ responses to different Items over a span of 31 days. Each observation in the Rec-log-training dataset records the response of a user to an Item at a time. The time period of the dataset is from October 13 to November 12, 2011. The User Profile dataset is the only customer-centered dataset in the database. This dataset records the Year of Birth, the Gender and the Number of Tweets of each of the 2,320,895 users with numerical values. It also records the tags of the users with strings. There are 6095 Items in 377 categories in the Item dataset. Each single Item belongs to a hierarchical category, e.g., Category 1.1.1.1. The Category and Keywords of each Item are recorded in the Item dataset with strings. The User SNS dataset contains the follow history of each user. There are 50,655,143 records in the User SNS dataset. The relationships of the customer–customer interactions are derived from the follow history. In the four datasets mentioned above, the User Profile dataset is an example of external and tag data, the Rec-log-training dataset is the basis of longitudinal individual behavioral data, the User SNS and Rec-log-training datasets are the bases of longitudinal engagement behavioral data. How to transform the Rec-log-training and User SNS datasets into longitudinal individual and engagement behavioral data is an important issue and will be discussed in the next subsection. 3.2. Data preprocessing Data preprocessing is a prerequisite phase of a data mining system and has a significant impact on the performance of the data mining 1

http://kddcup2012.org/c/kddcup2012-track1/data.

methods (Crone et al., 2006). With huge volume and large variety of data in the big data age, data preprocessing and scalability of the algorithms are two key ways to make standard data mining methods suitable for knowledge discovery from big data. For heterogeneous high-dimensional and multi-relational data in social media, data preprocessing include two distinct steps, i.e., data transformation and feature extraction. Data transformation and feature extraction from unstructured text and Web log data, as data preprocessing techniques, have been well studied in the last two decades (Sarawagi, 2007). However, relatively few studies focus on data transformation and feature extraction from heterogeneous high-dimensional and multi-relational data (Lahbib, Boulle, & Laurent, 2014). This is partly due to the fact that most multi-relational classification methods such as link-based classification only use local (non-network) attributes and univariate class labels of network neighbors (Macskassy & Provost, 2007). A few multirelational classification methods use multivariate network attributes derived from graph theory and social network analysis (Hill, Provost, & Volinsky, 2006; Zhang et al., 2012). In this study, novel data transformation and feature extraction methods are presented to transform heterogeneous high-dimensional and multi-relational data into multiple customer-centered high-order tensors which contain both the relationship and transactional information. Thus, these data can be used by the hierarchical ensemble learning methods. 3.2.1. Data transformation In the computational experiments, Microsoft Access 2010 was used to store the original database and Microsoft Excel 2010 was used to transform the original large-scale and multi-relational datasets into customer-centered datasets. Each single Item has a small number of labeled samples. Therefore, data on the responses to Items belonging to specific categories, rather than to a single Item, are analyzed. Data on responses to Items belonging to 10 categories randomly selected from the 377 categories are analyzed and used in the computational experiments. The historical records in the period from October 13 to November 9, October 14 to November 10 and October 15 to November 11 (T = 28) were used to train, validate and test the improved H-MKSVM models, respectively. The records on November 10, 11 and 12 were used to label customers as respondents or non-respondents in the training, validation and testing sets, respectively. Thus, the observations (users) having responses, i.e., accepted or not, to a specific category on November 10, 11 or 12 were kept in, and those not having responses were deleted from, the training, validation or testing dataset, respectively. Computational experiments were conducted first without using the engagement behavioral data. Microsoft query in Microsoft Excel 2010 was used in selecting the observations with the Items in each selected category into the Training dataset. Pivot Table in Microsoft Excel 2010 was used to transform the data of the selected observations into customer-centered longitudinal individual behavioral data represented by third-order tensors. As shown in Fig. 2, in the transformed datasets, each row represents the historical record of a user, each column represents the historical records of all users in a day and each dataset represents a longitudinal attribute with time length T = 28. Pivot Table in Microsoft Excel 2010 was then used to transform the User Profile dataset into the external dataset to obtain the same observations (users) as those in the longitudinal individual

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008

ARTICLE IN PRESS

JID: EOR 6

[m5G;October 13, 2014;17:32]

Z.-Y. Chen et al. / European Journal of Operational Research 000 (2014) 1–13 Table 2 Characteristics of the transformed customer-centered datasets. Transformed datasets

Original datasets

No. of attributes

External Tag Individual behavioral Engagement behavioral

User Profile User Profile Rec-log-training Rec-log-training, User SNS

m1 m2 m3 m4

behavioral dataset. Observations with missing values in the external dataset were deleted. The empty values in the tag dataset were all set to zero. The characteristics of the external data with numerical values, the tag data and the longitudinal individual behavioral data are presented in the first three rows in Table 2. The external dataset has m1 = 2 variables, i.e., gender and the number of tweets, and the tag dataset has m2 = 10 tags. The longitudinal individual behavioral dataset has m3 = 2 variables including the number of responses per day in the corresponding period mentioned above (Quantity) and whether or not the user accepted the recommendation of the Items in the category in the corresponding period (Acceptance). Instead of statically derived behavioral attributes used in most response models (Kang et al., 2012), longitudinal behavioral attributes with fixed time interval, i.e., Quantity and Acceptance, are directly extracted from the original transactional records to keep original information and make the feature extraction method suitable for general response modeling problems. The response variable yi for a customer i indicates whether or not the user accepted the recommendation of the Items in the category on November 10, 11 or 12 for the training, validation or testing set, respectively. 3.2.2. Feature extraction Computational experiments were then conducted by incorporating the engagement behavioral data into user response modeling. Pivot Table of Microsoft Excel 2010 was used to transform the User SNS dataset into the customer-centered social network dataset. Pivot Table of Microsoft Excel 2010 was then used to transform two relational datasets, i.e., the customer-centered social network and the longitudinal individual behavioral datasets, into customer-centered longitudinal engagement behavioral dataset. Because the engagement behavioral data are the customer-centered behavioral data of followees of the customers, the same m4 = 2 variables, i.e., Quantity and Acceptance, as those in the longitudinal individual behavioral dataset are in this dataset. The values of Quantity and Acceptance for the observations without followees were all set to zero. Hence, the longitudinal engagement behavioral dataset has the same observations as those in the longitudinal individual behavioral dataset. As shown in Fig. 2, the longitudinal engagement behavioral data is represented by a fourth-order tensor. A weighted average strategy can be used to aggregate the fourth-order tensor into a third-order tensor along the followee dimension Bˆ = {bˆ ij t |i = 1, . . . , n; j = 1, . . . , m4 ; t = 1, . . . , T }, and thus decrease the layers of the hierarchical ensemble methods. For example, the tie strength derived from social network analysis can be used as the weights of different followees. In this study, a simple average is used due to the lack of tie strength information. The characteristics of the longitudinal engagement behavioral data are presented in the last row in Table 2. 3.3. Subagging A common normalization method was applied to the external, longitudinal individual behavioral and longitudinal engagement behavioral data to rescale the values of each variable to the range between 0 and 1. A holdout validation approach was used. Each dataset is partitioned into three roughly equal sets, i.e., a training set, a validation set and a testing set. For customer response modeling, the

=2 = 10 =2 =2

Attributes

T

Gender and the number of tweets Tag1 , Tag2 , . . . , Tag10 Quantity and Acceptance Quantity and Acceptance

0 0 28 28

number of respondents is usually much smaller than the number of non-respondents (Cui et al., 2006; Lee et al., 2010). For example, the response rate is 8.78 percent with 1979 respondents and 20,569 nonrespondents for Items belonging to Category 1.1.1.1 in the original database. Sampling methods including undersampling and oversampling are the most commonly used techniques for dealing with highly imbalanced data (Burez & Van den Poel, 2009; Chen, Fan, et al., 2012; Kang et al., 2012; Verbeke, Dejaeger, Martens, Hur, & Baesens, 2012). Undersampling is more suitable for large-scale datasets than oversampling by sampling only the larger class. However, undersampling ignores large amount of data resulting in small samples (Verbeke et al., 2012). The sampling ratio θ is defined to be the number of nonrespondents over the number of respondents in the sample. Bagging is a state-of-the-art ensemble learning method. In comparison with boosting, bagging has the advantage of better scalability. For bagging, the training of the model using different training sets can be conducted in parallel and the results of different training sets can be combined by diverse ensemble methods such as majority voting (MV) and averaging (AV) (Polikar, 2006). The subagging method (Paleologo et al., 2010) can deal with the small sample problem and improve the scalability of the proposed hierarchical ensemble learning framework. Therefore, the subagging method (Paleologo et al., 2010) is used to divide imbalanced training sets into v non-overlapping balanced subsets2 each with equal number of respondents and non-respondents, i.e., with a sampling ratio θ = 1. Classification models usually perform better using roughly balanced training sets than using highly imbalanced training sets. This is especially true for the true positive rate. 4. The model An improved H-MK-SVM, based on the work of Chen, Fan, et al. (2012), is developed in the hierarchical ensemble learning framework. The H-MK-SVM is an extension of the SVM and the MK-SVM. The SVM is one of the most popular and effective machine learning techniques and usually has excellent classification performance in practical applications (Vapnik, 1998). The MK-SVM, as an important extension of the SVM, can integrate heterogeneous data and adaptively select the best combinations of multiple basic kernels in the learning process. The H-MK-SVM was developed to model longitudinal individual behavioral data for the application of customer churn prediction (Chen, Fan, et al., 2012). A three phase training algorithm for the H-MK-SVM is developed to sequentially learn the Lagrange multipliers, the weight of each longitudinal behavioral attribute and the weight of each single feature basic kernel. Chen, Fan, et al. (2012) provided more details about the MK-SVM and the H-MK-SVM. The improved H-MK-SVM includes two sequential tasks, i.e., the associated attribute selection and ensemble learning. Each task adopts a two phase training algorithm to sequentially learn the Lagrange multipliers and the weight of each basic kernel. The associated

2 The number of non-overlapping subsets v roughly equals the number of observations in the training set divided by the number of observations in a training subset. With the subagging strategy given a sampling ratio θ = 1, the number of observations in a training subset is two times of the number of respondents in the training subset.

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008

ARTICLE IN PRESS

JID: EOR

[m5G;October 13, 2014;17:32]

Z.-Y. Chen et al. / European Journal of Operational Research 000 (2014) 1–13

attribute selection is adopted to deal with multi-relations of the individual and engagement behavioral data. Different types of kernels used to model the four types of data are then combined to obtain the final model by ensemble learning. As shown in the hierarchical ensemble learning framework, the associated attribute selection by the improved H-MK-SVM in Layer 2 is discussed first and the ensemble learning by the improved H-MK-SVM in Layer 3 is discussed next. Finally, kernels used for the four types of data and the method of parameter tuning are described. 4.1. Associated attribute selection by the improved H-MK-SVM In Layer 2 of the hierarchical ensemble learning framework, the training dataset of the improved H-MK-SVM is G = {(B1 , Bˆ 1 , y1 ), . . . , (Bn , Bˆ n , yn )} with the longitudinal individual and engagement behavioral data. The desired output of the model for each observation yi is the class membership of the customer, i.e., a respondent or a non-respondent. The improved H-MK-SVM in Layer 2 of the hierarchical ensemble learning framework constructs a hyperplane in a high dimensional feature space

ˆ i) = f (Bi , B

m3  T 

wTjt˜ · φjt˜ (bijt˜ ) +

t=1 ˜ j=1

m4  Tˆ 

w ˜ Tj t · φ˜j t (bˆ ij t ) + b ,

(1)

j =1 t =1

where φjt˜ (bijt˜ ) and φ˜j t (bˆ ij t ) are the nonlinear maps, wT˜ and w ˜ Tj t are jt

the vectors of weights and b is the bias. For the longitudinal individual behavioral data, the multiple kernel (2) in the following is used to map the elements of the input matrices Bi into high-dimensional feature spaces via the nonlinear maps φjt˜ (bijt˜ )

K3 (Bi , B˜ı ) =

m3  T 

γjt˜ kjt˜ (bijt˜ , b˜ıjt˜ ),

(2)

γjt˜ ≥ 0, for j˜ = 1, . . . , m3 , t = 1, . . . , T

(7)

ˆ γˆj t ≥ 0, for j = 1, . . . , m4 , t = 1, . . . , T,

(8)

where C is the regularization parameter, ξi is the relaxation or the error term of observation i, ξ is a vector with ξi as its elements, w is a composite vector of all wjt˜ and w ˜ is a composite vector of all w ˜ j t  . A two-phase iterative procedure (Chen, Fan, et al., 2012) is used to decompose the problem in (4)–(8) into two sub-problems and to solve them iteratively. In phase 1, the values of the elements of γ  are fixed and the dual of (4)–(8) is solved. The dual is stated as the quadratic program in (9)–(11) in the following

max α

s.t.

n 

αi −

n 

yi αi = 0

0 ≤ αi ≤ C,

for i = 1, . . . , n

(11)

where αi is the Lagrange multiplier or the dual variable of observation i and α is a vector with αi as its elements. The values of the dual variables are determined after the dual in (9)–(11) is solved. Using the values of the dual variables obtained in phase 1, the primal variables ˜ Tj t can be written as (12) and (13), respectively, in the wT˜ and w jt

following

wjt˜ = γjt˜

n 

yi αi φjt˜ (bijt˜ ),

for j˜ = 1, . . . , m3 , t = 1, . . . , T

(12)

i=1

w ˜ j t = γˆj t

n 

yi αi φ˜j t (bˆ ij t ),

for j = 1, . . . , m4 , t = 1, . . . , Tˆ

γˆj t kj t (bˆij t , bˆ˜ıj t ),

(3)

(13) In phase 2, the values of

denote the vectors of all the weights of the basic kernels in (2) and (3), ˆ is used to denote the composite vector respectively, and γ  = (γ , γ) consisting of the elements of γ and γˆ . The values of the elements of γ  are determined in the attribute selection process. When the multiple kernels in (2) and (3) are used, the improved HMK-SVM in Layer 2 of the ensemble learning framework is formulated as the following quadratic program

⎧ ⎫ m3  m4  Tˆ T n 2  wjt˜ 2  ˜ j t   ⎬ 1 ⎨ w min min + ξi + C γ  w,w, γjt˜ γˆj t ⎭ ˜ ξ,b 2 ⎩   t=1 ˜ j=1

j =1 t =1

i=1

⎛ ⎞ m3  m4  Tˆ T   yi ⎝ wTjt˜ · φjt˜ (bijt˜ ) + w ˜ Tj t · φ˜j t (bˆ ij t ) + b ⎠ j =1 t =1

t=1 ˜ j=1

for i = 1, . . . , n

ξi ≥ 0, for i = 1, . . . , n

(5) (6)

and the components of wT˜ jt

and

w ˜ Tj t

wT˜ jt

and w ˜ Tj t

in (12) and (13), the

original problem in (4)–(8) can be rewritten as

⎛ m3  T n n  1  ⎝ min α γjt˜ k(bijt˜ , b˜ıjt˜ ) i α˜ı yi y˜ı γ  ,ξ 2 i=1 ˜ı=1

+

m4  Tˆ 

t=1 ˜ j=1



γˆj t k(bˆij t , bˆ˜ıj t )⎠ + λ

j =1 t =1

s.t.

yi

⎧ n ⎨ ⎩

ξi

(14)

i=1

˜ı=1

+

n 

⎛ m3  T  α˜ı y˜ı ⎝ γjt˜ k(bijt˜ , b˜ıjt˜ ) t=1 ˜ j=1

m4  Tˆ 



γˆj t k(bˆij t , bˆ˜ıj t )⎠ + b˜

j =1 t =1

≥ 1 − ξi , (4)

b

are fixed. Using the primal variables

where kj t (bˆ ijt˜  , bˆ˜ıjt˜  ) = φ˜j t (bˆ ij t ) · φ˜j t (bˆ˜ıj t ) is the basic kernel and γˆj t is the weight of kj t (bˆ ij t , bˆ˜ıj t ). When convenient, γ and γˆ are used to

≥ 1 − ξi ,

(10)

i=1

j =1 t =1

s.t.

(9)

i=1

where kjt˜ (bijt˜ , b˜ıjt˜ ) = φjt˜ (bijt˜ ) · φjt˜ (b˜ıjt˜ ) is the basic kernel and γjt˜ is the weight of kjt˜ (bijt˜ , b˜ıjt˜ ). For the longitudinal engagement behavioral data, a similar multiple kernel (3) in the following is used to map the ˆ i into feature spaces via the nonlinear elements of the input matrices B maps φ˜j t (bˆ ij t ) m4  Tˆ 

n n 1  αi α˜ı yi y˜ı (K3 (Bi , B˜ı ) + K4 (Bˆ i , Bˆ ˜ı )) 2 i=1 ˜ı=1

i=1

t=1 ˜ j=1

ˆ ˜ı ) = ˆ i, B K4 (B

7

⎫ ⎬ ⎭

for i = 1, . . . , n

(15)

ξi ≥ 0, for i = 1, . . . , n

(16)

γjt˜ ≥ 0, for j˜ = 1, . . . , m3 , t = 1, . . . , T

(17)

ˆ γˆj t ≥ 0, for j = 1, . . . , m4 , t = 1, . . . , T,

(18)

where λ is the regularization parameter and αi is the Lagrange multiplier of observation i obtained in phase 1. Because minimizing the L1 -norm based regularization function (14) leads to sparse solutions for the elements of γ  , the solution process of the problem (14)–(18)

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008

ARTICLE IN PRESS

JID: EOR 8

[m5G;October 13, 2014;17:32]

Z.-Y. Chen et al. / European Journal of Operational Research 000 (2014) 1–13

is also a feature selection process. The solution of this problem can be obtained by solving its dual. The dual is stated in (19)–(23) in the following

max u

s.t.

n 

ui

(19)

i=1 n 

u i yi

n 

α˜ı y˜ı k(bijt˜ , b˜ıjt˜ ) ≤

˜ı=1

i=1

n n 1  αi α˜ı yi y˜ı k(bijt˜ , b˜ıjt˜ ), 2

n 

u i yi

n 

α˜ı y˜ı k(bˆij t , bˆ˜ıj t ) ≤

˜ı=1

i=1

ˆ = 1, . . . , 4, as its elements. with βmˆ , for m Different kernels are used for different types of data. For the external data with numerical values, the standard single Gaussian kernel, also known as the radial basis function (RBF) kernel, is used



(20)

n n 1  αi α˜ı yi y˜ı k(bˆij t , bˆ˜ıj t ), 2 i=1 ˜ı=1



1

si − s˜ı 2 , 2

(29)

σ1

where σ12 is the kernel parameter. For the external data with strings, the string kernel in (30) in the following is used

K2 (sˆi , sˆ˜ı ) =

m2 

I(sˆijˆ, sˆ˜ıjˆ).

(30)

ˆ j=1

for j = 1, . . . , m4 , t = 1, . . . , Tˆ n 



maps, w, ˆ w , w ¯ jt˜ and wj t are the vectors of weights, βmˆ is the weight ˆ and b˜ is the bias. In the following β represents a vector of data type m,

K1 (si , s˜ı ) = exp −

i=1 ˜ı=1

for j˜ = 1, . . . , m3 , t = 1, . . . , T

where φˆ (si ), φ  (sˆi ), φjt˜ (bijt˜ ) and φ˜j t (bˆ ij t ) are the nonlinear

(21)

u i yi = 0

(22)

i=1

The following multiple kernel (31) similar to (2) is used for the longitudinal individual behavioral data

K˜3 (Bi , B˜ı ) =

m3  T 

γjt˜ kjt˜ (bijt˜ , b˜ıjt˜ ),

(31)

t=1 ˜ j=1

0 ≤ ui ≤ λ,

for i = 1, . . . , n,

(23)

where ui is the dual variable of observation i and u is a vector with ui as its elements. Sometimes, the weights attached to the elements of γ  in the objective function (14) can be set to constants to make the linear program in (14)–(18) easier to solve. For example, (24) and (25) in the following can be used in (14) n n 1  αi α˜ı yi y˜ı k(bijt˜ , b˜ıjt˜ ) = 1 2

(24)

n n 1  αi α˜ı yi y˜ı k(bˆij t , bˆ˜ıj t ) = 1. 2

(25)

where kjt˜ (bijt˜ , b˜ıjt˜ ) is the basic kernel and γjt˜ is the known weight obtained in Layer 2 of the hierarchical ensemble learning framework. The difference between K3 (Bi , B˜ı ) in (2) and K˜3 (Bi , B˜ı ) in (31) is that γjt˜ is variable in (2) but known in (31). The following multiple kernel (32) similar to (3) is used for the longitudinal engagement behavioral data

ˆ ˜ı ) = ˆ i, B K˜4 (B

i=1 ˜ı=1

When the variables wT˜ and w ˜ Tj t in (12) and (13) and the kernels jt

in (2) and (3) are used in (1), the classification function constructed by the improved H-MK-SVM in Layer 2 of the hierarchical ensemble learning framework is

ˆ ˜ı ) = sgn Y (B˜ı , B

γˆj t kj t (bˆij t , bˆ˜ıj t ),



αi yi (K3 (Bi , B˜ı ) + K4 (Bˆ i , Bˆ ˜ı )) + b

(26)

where kj t (bˆ ijt˜  , bˆ˜ıjt˜  ) is the basic kernel and γˆj t is the known weight obtained in Layer 2 of the hierarchical ensemble learning framework. ˆ ˜ı ) in (3) and K˜4 (B ˆ ˜ı ) in (32) is that ˆ i, B ˆ i, B The difference between K4 (B γˆj t in (3) is variable but known in (32). When the kernels in (29), (30), (31) and (32) above are used, the model of the improved H-MK-SVM in Layer 3 of the hierarchical ensemble learning framework is formulated as

min β

min

w,w ˆ  ,w, ¯ w,ξ˜ ,b˜

⎛ m3  T  w¯ jt˜ 2 1⎝ 2 2 β1 w ˆ  + β2 w  + β3 2 γjt˜

The bias b in (26) is computed using (27) in the following

b = y˜ı −

n 

αi yi (K3 (Bi , B˜ı ) + K4 (Bˆ i , Bˆ ˜ı )), for α˜ı ∈ (0, C ).

+ β4 (27)

In Layer 3 of the hierarchical ensemble learning framework, the input dataset of the improved H-MK-SVM is G˜ = {(s1 , sˆ1 , B1 , Bˆ 1 , y1 ), . . . , (sn , sˆn , Bn , Bˆ n , yn )} with external and behavioral data. The improved H-MK-SVM in Layer 3 of the hierarchical ensemble learning framework constructs a hyperplane in a high dimensional feature space

ˆ si ) + β2 w T · φ  (sˆi ) β1 w ˆ T · φ( + β3

m3  T 

w ¯ Tjt˜ · φjt˜ (bijt˜ )

t=1 ˜ j=1

+ β4

m4  Tˆ  j =1 t =1

T

˜ wj t · φ˜j t (bˆ ij t ) + b,

(28)

ˆ si ) + β2 w · φ  (sˆi ) + β3 yi ⎝β1 w ˆ T · φ( T

+ β4

m4  Tˆ 

(33)

i=1

⎛ s.t.

4.2. Ensemble learning by the improved H-MK-SVM

2 m4  Tˆ n   w j t   ⎠ + C˜ ξ˜i γˆj t   j =1 t =1

i=1

ˆ i) = f˜(si , sˆi , Bi , B

t=1 ˜ j=1



i=1



(32)

j =1 t =1

i=1 ˜ı=1

n 

m4  Tˆ 

⎞ T w j t 

m3  T 

w ¯ Tjt˜ · φjt˜ (bijt˜ )

t=1 ˜ j=1

· φ˜j t (bˆ ij t ) + b˜ ⎠

j =1 t =1

≥ 1 − ξ˜i ,

for i = 1, . . . , n

(34)

ξ˜i ≥ 0, for i = 1, . . . , n

(35)

βmˆ ≥ 0, for mˆ = 1, . . . , 4,

(36)

where C˜ is the regularization parameter, ξ˜i is the relaxation or error ¯ is a term for observation i, ξ˜ is a vector with ξ˜i as its elements, w composite vector of w ¯ jt˜ and w is a composite vector of wj t . A two-phase procedure is used to solve the problem in (33)–(36). In phase 1, the values of the elements of β are fixed and the dual of

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008

ARTICLE IN PRESS

JID: EOR

[m5G;October 13, 2014;17:32]

Z.-Y. Chen et al. / European Journal of Operational Research 000 (2014) 1–13

(33)–(36) is solved. The dual is written as

max α˜

n 

α˜ i −

i=1

1 2

n  n 

4.3. Kernels and parameter tuning

α˜ i α˜ ˜ı yi y˜ı (β1 K1 (si , s˜ı ) + β2 K2 (sˆi , sˆ˜ı )

i=1 ˜ı=1

ˆ i, B ˆ ˜ı )) + β3 K˜3 (Bi , B˜ı ) + β4 K˜4 (B n 

s.t.

(37)

yi α˜ i = 0 for i = 1, . . . , n

(39)

where α˜ i is the Lagrange multiplier or the dual variable of observation ˜ is a vector with α˜ i as its elements. The values of the dual i and α variables are determined after the dual is solved. The primal variables w ˆ and w are represented by the dual variables in the same way as in the standard SVMs (Chen, Fan, et al., 2012) and w ¯ jt˜ and wj t are expressed in similar ways to wjt˜ and w ˜ j t in (12) and (13). ˜ In phase 2, the values of b and the elements of w, ˆ w , w ¯ ˜ and wj t w ,



jt

are fixed. Using the primal variables w, ˆ w ¯ jt˜ and wj t , the original problem in (33)–(36) can be rewritten as the linear program in (40)– (43) in the following n n 1  α˜ i α˜ ˜ı yi y˜ı (β1 K1 (si , s˜ı ) + β2 K2 (sˆi , sˆ˜ı ) + β3 K˜3 (Bi , B˜ı ) β,ξ˜ 2 i=1 ˜ı=1

min

˜ ˆ i, B ˆ ˜ı )) + λ + β4 K˜4 (B

n 

ξ˜i

(40)

i=1

s.t.

yi

 n

α˜ ˜ı y˜ı (β1 K1 (si , s˜ı ) + β2 K2 (sˆi , sˆ˜ı ) + β3 K˜3 (Bi , B˜ı )

˜ı=1

 ˆ ˜ı )) + b˜ ≥ 1 − ξ˜i , ˆ i, B + β4 K˜4 (B

for i = 1, . . . , n

(41)

ξ˜i ≥ 0, for i = 1, . . . , n

(42)

βmˆ ≥ 0, for mˆ = 1, . . . , 4,

(43)

˜ is the regularization parameter. The weights β are obtained where λ after this linear program is solved. ¯ jt˜ and wj t and the kernels for the four When the variables w, ˆ w , w types of data are used in (28), the classification function constructed by the improved H-MK-SVM in Layer 3 of the hierarchical ensemble learning framework is

 n ˆ ˜ı ) = sgn α˜ i yi (β1 K1 (si , s˜ı ) + β2 K2 (sˆi , sˆ˜ı ) Y˜ (s˜ı , sˆ˜ı , B˜ı , B i=1

 ˆ ˜ı )) + b˜ . ˆ i, B + β3 K˜3 (Bi , B˜ı ) + β4 K˜4 (B

(44)

The bias b˜ in (44) is computed using (45) in the following

b˜ = y˜ı −

n 

In the improved H-MK-SVM, the Gaussian kernel is used for the external data with numerical values in (29) and used as the basic kernels for the longitudinal behavioral data in (2), (3), (31) and (32). For the external data with strings, I(sˆijˆ, sˆ˜ıjˆ) in (30) is given in (46) in the following

(38)

i=1

˜ 0 ≤ α˜ i ≤ C,

α˜ i yi (β1 K1 (si , s˜ı ) + β2 K2 (sˆi , sˆ˜ı ) + β3 K˜3 (Bi , B˜ı )

i=1

ˆ ˜ı )), ˆ i, B + β4 K˜4 (B

for α˜ ˜ı ∈ (0, C˜ ).

9

(45)

It should be noted that the improved H-MK-SVM in Layers 2 and 3 of the hierarchical ensemble learning framework play different roles. The one in Layer 2 using two types of behavioral data is for associated attribute selection and the one in Layer 3 obtains the final classification function (44). After associated attribute selection, the H-MK-SVM in Layer 2 can also construct a classification function using just the two types of behavioral data. The hierarchical ensemble learning using the improved H-MK-SVM combines the advantages of sparse modeling with reduced feature sets and the ensemble learning with diverse heterogeneous data and improves the performance of the user response models.

I(sˆijˆ, sˆ˜ıjˆ) =

1,

if (sˆijˆ = sˆ˜ıjˆ)

0,

if (sˆijˆ = sˆ˜ıjˆ)

.

(46)

For the longitudinal individual behavioral attribute Quantity, the following Gaussian kernel in (47) is used as the basic kernel of the multiple kernels (2)

 kjt˜ (bijt˜ , b˜ıjt˜ ) = exp −



1

σ32

bijt˜ − b˜ıjt˜ 

2

for j˜ = 1,

,

(47)

where σ32 is the kernel parameter. In the improved H-MK-SVM in Layer 3 of the hierarchical ensemble learning framework, the same Gaussian kernel (47) is used as the basic kernel of the multiple kernels (31) for the attribute Quantity with a different kernel parameter σ˜ 32 . For the longitudinal individual behavioral attribute Acceptance, i.e., bijt˜ with j˜ = 2, the same Gaussian kernels (47) with parameters σ42

and σ˜ 42 are used as the basic kernels of the multiple kernels in the improved H-MK-SVM in Layers 2 and 3, respectively, of the hierarchical ensemble learning framework. For the longitudinal engagement behavioral attribute Quantity, the Gaussian kernel (48) in the following is used as the basic kernel of the multiple kernels (3)

 kj t (bˆ ij t , bˆ˜ıj t ) = exp −

1

bˆij t − bˆ˜ıj t  2

σ5

 2

,

for j = 1,

(48)

where σ52 is the kernel parameter. In the improved H-MK-SVM in Layer 3 of the hierarchical ensemble learning framework, the same Gaussian kernel (48) with parameter σ˜ 52 is used as the basic kernel in the multiple kernels (32) for the attribute Quantity. For the longitudinal engagement behavioral attribute Acceptance, i.e., bˆ ij t with j = 2, the same Gaussian kernels (48) and parameters σ62 and σ˜ 62 are used in the improved H-MK-SVM in Layers 2 and 3, respectively, of the hierarchical ensemble learning framework. The grid search method (Chen, Fan, et al., 2012) was used to tune the parameters in the improved H-MK-SVM. For the improved HMK-SVM in Layer 2 of the hierarchical ensemble learning framework, exponentially growing values for λ, C, 1/σ32 , 1/σ42 , 1/σ52 and 1/σ62 from 10−2 to 102 were tried in turn. For the improved H-MK-SVM in Layer 3 of the hierarchical ensemble learning framework, a nested ˜ , C, ˜ grid search strategy was used. Exponentially growing values for λ 1/σ˜ 12 , 1/σ˜ 32 , 1/σ˜ 42 , 1/σ˜ 52 and 1/σ˜ 62 from 10−2 to 102 were tried first in turn. These parameters were then finely tuned. Additively growing ˜ 1/σ˜ 2 , 1/σ˜ 2 , 1/σ˜ 2 , values in the best intervals obtained earlier for C, 1 3 4 1/σ52 and 1/σ62 were tried in turn. The values of these parameters with the best performance on the validation sets were used to test the performance of the models.

5. Computational experiments Computational results are reported in this section. Matlab 7.4 was used to conduct the computational experiments. The desktop computer used for the computation has an Intel Core i7 processor with a

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008

ARTICLE IN PRESS

JID: EOR 10

[m5G;October 13, 2014;17:32]

Z.-Y. Chen et al. / European Journal of Operational Research 000 (2014) 1–13 Table 3 Results of the improved H-MK-SVM with and without hierarchical ensemble learning. H-MK-SVM

Local classifiers

Ensemble

MP

S1 S2 S3 S4 S5

AUC

PCC

MP

AUC

PCC

Mean

STD

Mean

STD

Mean

STD

Mean

STD

Mean

STD

Mean

STD

2.97 1.95 1.16 2.29 2.39

3.42 1.07 1.37 0.72 1.00

66.09 59.06 59.01 63.04 66.02

3.42 3.79 3.70 5.33 3.49

75.83 86.45 83.71 70.59 72.84

12.38 2.53 16.01 21.78 14.58

2.91 2.03 0.75 2.49 2.83

0.74 0.98 0.51 0.63 0.93

68.62 60.71 60.77 66.36 69.05

2.08 4.41 4.37 1.57 2.38

76.66 88.02 88.41 70.99 74.83

10.84 1.54 1.24 27.56 8.35

Table 4 Results of the improved H-MK-SVM using longitudinal individual behavioral data with varying lengths and aggregation scales. Measures

T

Scale

28

MP AUC PCC

14

7

1

1

2

7

28

Mean

STD

Mean

STD

Mean

STD

Mean

STD

Mean

STD

Mean

STD

Mean

STD

Mean

STD

2.97 66.09 75.83

3.42 3.42 12.38

2.33 64.57 74.83

0.82 4.15 18.37

2.33 64.59 74.66

1.94 4.54 19.41

1.70 63.77 52.59

1.26 4.61 28.23

2.97 66.09 75.83

3.42 3.42 12.38

2.64 65.43 73.13

0.79 3.72 17.81

2.63 63.22 73.93

0.73 13.96 13.94

2.08 64.02 62.78

1.25 4.47 25.40

3.40 gigahertz clock speed and has 16 gigabytes of RAM. Eight competitive methods including the SVM, feed-forward ANN (FFANN), radial basis function neural network (RBFNN), decision tree (DT), random forest (RF), Adaboost and the two ensemble methods, i.e., MV and AV, were also used in the experiments to compare their performances with that of the improved H-MK-SVM. The Neural Network and the Statistics toolboxes in Matlab 7.4 were used to implement the FFANN, RBFNN and DT. Three criteria including the maximum profit (MP) (Verbeke et al., 2012), the area under the receiver operating characteristic curve (AUC) and the overall hit rate (PCC) are used to measure performances. The MP is a very suitable measure for customer response modeling due to its consideration of both the class distribution and misclassification costs (Verbeke et al., 2012). The MP criteria proposed for churn prediction by Verbeke et al. (2012) can be naturally extended to response modeling by giving different meanings to the parameters.3 The AUC is a robust estimator of prediction performance (Lee et al., 2010). The LSSVMlab v1.8 toolbox4 was used for the computation of the AUC. However, it is known that the PCC is often not an appropriate measure for the imbalanced and cost-sensitive classification problems (Verbeke et al., 2012). As shown in the following, a model performing well in terms of MP and AUC may perform poorly in terms of the PCC. The computational results reported in the following are obtained on the testing sets. The mean (Mean) and standard derivation (STD) of each criterion are reported in the following tables. Statistical tests, i.e., ANOVA (analysis of variance) for the differences in means and Fisher’ LSD (least significant difference)-t test for multiple comparisons were conducted. In Tables 3–6, the best result for each measure is highlighted, the results not significantly different from the best result are in italics and those significantly different from the best result are in regular fonts both at the 0.05 significance level. More details of the statistical tests are reported in the Appendix.

3 For response modeling, the parameter α is set to α = 10 percent; the parameter β represents the fraction of true respondents included in the marketing campaign; γ

represents the conversion ratio and CLV represents the average customer value in the marketing campaign. When the purchase (click) data are used, the conversion ratio is set to γ = 1(0 ≤ γ ≤ 1). As a special note, α , β and γ mentioned in this footnote are for the computation of the MP only but are not related to the notations in the models. 4 http://www.esat.kuleuven.be/sita/lssvmlab/.

Table 5 Results of different classifiers without engagement behavioral data on different categories. Method

H-MK-SVM SVM FFANN RBFNN DT RF Adaboost AV MV

MP

AUC

PCC

Mean

STD

Mean

STD

Mean

STD

2.97 2.10 1.36 1.76 1.68 1.38 1.63 2.01 2.21

3.42 1.01 1.37 0.57 1.97 0.73 1.93 1.05 1.72

66.09 64.51 63.05 63.16 59.18 59.58 58.22 64.47 62.51

3.42 4.18 9.52 6.69 4.01 4.52 3.80 4.99 5.75

75.83 58.78 51.58 47.98 51.81 47.62 45.28 59.64 59.64

12.38 0.20 31.14 33.56 29.76 26.43 25.39 34.45 34.97

Table 6 Results of different classifiers with engagement behavioral data. Method

H-MK-SVM SVM FFANN RBFNN DT RF Adaboost AV MV

MP

AUC

PCC

Mean

STD

Mean

STD

Mean

STD

3.29 1.46 1.25 1.46 0.83 1.09 0.96 1.46 1.48

1.21 1.02 0.86 0.78 0.71 0.62 0.58 0.96 1.27

67.70 64.89 64.25 63.85 52.26 58.26 56.22 61.65 65.85

2.42 4.05 3.93 5.24 7.00 4.06 4.17 4.90 3.98

48.11 49.02 69.30 51.49 56.11 48.74 49.60 76.40 87.42

0.17 0.21 25.09 37.10 20.53 20.56 19.77 26.76 2.83

5.1. Results with and without hierarchical ensemble learning In the following, user response modeling performances with and without hierarchical ensemble learning are compared. To measure the benefits of using the hierarchical ensemble learning framework, the following five experiments are conducted: (S1) user response modeling based on the hierarchical ensemble learning framework using the external, tag and keyword, and longitudinal individual behavioral data; (S2) user response modeling by the improved H-MK-SVM in Layer 2 using only the longitudinal individual behavioral variable Quantity; (S3) user response modeling by the improved H-MK-SVM in Layer 2 using only the longitudinal individual behavioral variable Acceptance; (S4) user response modeling by the improved H-MKSVM in Layer 3 using the same data as those under S1; and (S5)

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008

JID: EOR

ARTICLE IN PRESS

[m5G;October 13, 2014;17:32]

Z.-Y. Chen et al. / European Journal of Operational Research 000 (2014) 1–13

user response modeling based on the hierarchical ensemble learning framework without learning the weights of the three types of data in S1, i.e., βmˆ = 1. For these five experiments, the results of the sub-classifiers used in subagging and of the ensemble of the sub-classifiers using the AV method are reported in Table 3. The measures of the ensemble are computed based on the average of the outputs of the improved H-MKSVMs on the balanced subsets. As shown in Table 3, the individual H-MK-SVMs under S1 obtained the highest average MP (2.97) and AUC (66.09 percent), and the ensemble of the H-MK-SVMs under S1 obtained the highest MP (2.91) and the second highest AUC (68.62 percent), while that under S5 obtained the best AUC (69.05 percent). Results in Table 3 show that the average MP and AUC of the individual H-MK-SVMs and of the ensemble of the H-MK-SVMs with hierarchical ensemble learning, i.e., S1 and S5, are considerably higher than those of the H-MK-SVMs without hierarchical ensemble learning, i.e., S2, S3 and S4. 5.2. Effects of varying time lengths and aggregation scales In the following, the effects of varying time lengths and aggregation scales of the longitudinal individual behavioral data on prediction performance are examined. The results of the ensemble of the improved H-MK-SVMs using the longitudinal individual behavioral data with different time lengths are shown in Table 4. Results in Table 4 show that the improved H-MK-SVM using the behavioral data with T = 28 obtained the highest average MP, AUC and PCC. Specially, the MP, AUC and PCC of the improved H-MK-SVM using T = 28 are much higher than those using a smaller T. These results show that long enough behavioral data need to be stored in the data warehouse and used in user behavioral analysis. The longitudinal individual behavioral data are aggregated at different scales to examine the effects of these scales on performance. Specifically, the behavioral data are aggregated per day (Scale = 1), i.e., no aggregation, per 2 days (Scale = 2), per 7 days (Scale = 7) and per 28 days (Scale = 28), respectively. The results of the improved HMK-SVM using longitudinal individual behavioral data with different aggregation scales are also shown in Table 4. As shown in Table 4, the improved H-MK-SVM obtained higher MP, AUC and PCC with Scale = 1 than with all other aggregation scales. Therefore, it is helpful to select a suitable aggregation scale of the longitudinal data. 5.3. Comparisons of performance of the improved H-MK-SVM with other methods In the following, the performances of the eight competitive methods are compared with that of the improved H-MK-SVM. Because the SVM, FFANN, RBFNN, DT, RF and Adaboost cannot be directly used to model heterogeneous and tensor data, the longitudinal individual behavioral attributes represented by a third-order tensor were aggregated as a matrix O = {oij˜|i = 1, . . . , n; j˜ = 1, . . . , m3 }. Both the external attributes and the aggregated behavioral attributes, i.e., the composite vector xi = [si ,oi ], were used as inputs in these six methods. For the ensemble methods, one SVM is used to model the external and tag attributes and two SVMs are used to model the longitudinal individual behavioral variables Quantity and Acceptance. The two ensemble methods, i.e., MV and AV, are then used to combine the results of the SVMs. Unlike the RF and Adaboost, the MV and AV can be used to combine different feature sets from heterogeneous data. The results of the improved H-MK-SVM and of the other eight methods on the ten randomly selected categories are presented in Table 5. The improved H-MK-SVM obtained the highest MP, AUC and PCC. As shown in Table 5, the improved H-MK-SVM demonstrated from 0.76 (1.58 percent) to 1.61 (11.87 percent) improvements in the MP (AUC) over the other eight methods. These results show that the improved H-MK-SVM using longitudinal individual behavioral data

11

outperforms the more traditional methods using aggregated individual behavioral data. Therefore, using the longitudinal individual behavioral data in the hierarchical ensemble learning framework improves the performance of user response modeling. 5.4. Effects of incorporating the engagement behavioral data In the following, the effects of using the engagement behavioral data on prediction performance are examined. The external, the tag and keyword, the longitudinal individual behavioral and longitudinal engagement behavioral data are input into the improved H-MK-SVM to obtain the classification results. The performances of the improved H-MK-SVM, as well as the other eight competitive methods, incorporating the engagement behavioral data are reported in Table 6. For the SVM, FFANN, RBFNN, DT, RF and Adaboost, both the longitudinal individual and the engagement behavioral attributes were aggregated as O = {oij |i = 1, . . . , n; j = 1, . . . , m3 + m4 }, and the external attributes and the aggregated behavioral attributes, i.e., the composite vector xi = (si ,oi ), were used as input. For the ensemble methods, one SVM is used to model the external and tag attributes, two SVMs are used to model the longitudinal individual variables Quantity and Acceptance, and two SVMs are used to model the engagement behavioral variables Quantity and Acceptance. The MV and AV are then used to combine the results of the SVMs. For the results reported in Table 6, only users with followees are selected into the datasets to investigate the effect of incorporating the engagement behavioral data on the classification performance. As shown in Table 6, the improved H-MK-SVM using the longitudinal engagement behavioral data obtained the highest MP and AUC. The improved H-MK-SVM using the engagement behavioral data demonstrated 0.32 and 1.61 percent improvements in the MP and the AUC, respectively, over the improved H-MK-SVM using only the external and individual behavioral data. These results show that the use of the engagement behavioral data in the improved H-MK-SVM can improve the user response modeling performance. Table 6 also shows that the improved H-MK-SVM using the engagement behavioral data demonstrated from 1.81 (1.85 percent) to 2.46 (15.44 percent) improvements in the MP (AUC) over the other eight methods. Therefore, the performance of the improved H-MK-SVM is obviously superior to those of the other methods using the aggregated behavioral data and of the other ensemble methods. 6. Conclusions In this study, a hierarchical ensemble learning framework is developed for behavior-aware user response modeling in social media using diverse heterogeneous data. In the framework, a general-purpose data transformation and feature extraction strategy is proposed and an improved H-MK-SVM is developed. In comparison with the work of Chen, Fan, et al. (2012) on customer churn prediction, the major contributions of this study are (1) for the original data from social media, a data transformation and feature extraction strategy is proposed to transform heterogeneous high-dimensional multi-relational data into customer-centered high-order tensors and to extract prediction attributes; (2) four types of data, i.e., external, tag and keyword, longitudinal individual behavioral and longitudinal engagement behavioral data, are simultaneously used for the first time as input of response models to improve prediction performance; (3) for classification using heterogeneous data, an improved H-MK-SVM is developed to hierarchically integrate feature selection from multiple correlated attributes and ensemble learning for user response modeling; (4) for large imbalanced datasets, the subagging strategy is adopted to integrate the local results of the improved H-MK-SVMs on multiple sample subsets to obtain better performance. Computational experiments are conducted using a real-world microblog database. The experimental results show that (1) the

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008

JID: EOR 12

ARTICLE IN PRESS

[m5G;October 13, 2014;17:32]

Z.-Y. Chen et al. / European Journal of Operational Research 000 (2014) 1–13

improved H-MK-SVM with hierarchical ensemble learning exhibits superior performance over that without hierarchical ensemble learning; (2) the improved H-MK-SVM using the longitudinal individual behavioral data demonstrates considerable improvements over the other eight competitive methods; (3) the improved H-MK-SVM using the longitudinal engagement behavioral data demonstrates substantial improvements over the improved H-MK-SVM using only the external and individual behavioral data; and (4) the improved HMK-SVM using the longitudinal engagement behavioral data demonstrates considerable improvements over the other competitive methods using the aggregated engagement behavioral data. Furthermore, this study investigates the usefulness of selecting a suitable time length and aggregation scale of the longitudinal behavioral data for user response modeling. The hierarchical ensemble learning framework provides valuable implications about how to integrate diverse heterogeneous user data available in the databases of electronic commerce and social media. Integrating multi-channel, multi-network, multi-media (text, video and audio) and unlabeled data into customer relationship management and direct marketing models to improve the prediction performance so as to effectively allocate the marketing resources will be a direction for future research. Selecting the best combination of local classifiers and assigning the best weights to them to improve the performance and stability of the ensemble of the individual H-MKSVMs deserve further study. Multi-task and multi-class multi-label multiple kernel learning methods may also be investigated as future work to simultaneously obtain the results of multiple categories. Furthermore, investigating the effectiveness of the models by focusing on specific, such as age, gender and RFM, user groups may deserve further study. Acknowledgments The authors greatly appreciate the three anonymous reviewers for their constructive suggestions. This work was partially supported by the National Natural Science Foundation of China (Project Nos. 71101023, 71471035 and 71271051) and the Fundamental Research Funds for the Central Universities, NEU, China (Project No. N120406001). Supplementary Materials Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.ejor.2014.09.008. References Baecke, P., & Van den Poel, D. (2011). Data augmentation by predicting spending pleasure using commercially available external data. Journal of Intelligent Information Systems, 36(3), 367–383. Baesens, B., Viaene, S., Van den Poel, D., Vanthienen, J., & Dedene, G. (2002). Bayesian neural network learning for repeat purchase modelling in direct marketing. European Journal of Operational Research, 138(1), 191–211. Ballings, M., & Van den Poel, D. (2012). Customer event history for churn prediction: How long is long enough?. Expert Systems with Applications, 39(18), 13517–13522. Bijmolt, T. H. A., Leeflang, P. S. H., Block, F., Eisenbeiss, M., Hardie, B. G. S., Lemmens, A., et al. (2010). Analytics for customer engagement. Journal of Service Research, 13(3), 341–356. Bose, I., & Chen, X. (2009). Quantitative models for direct marketing: A review from systems perspective. European Journal of Operational Research, 195(1), 1–16. Buckinx, W., Moons, E., Van den Poel, D., & Wets, G. (2004). Customer-adapted coupon targeting using feature selection. Expert Systems with Applications, 26(4), 509–518. Burez, J., & Van den Poel, D. (2009). Handling class imbalance in customer churn prediction. Expert Systems with Applications, 36(3), 4626–4636. Cao, L., Ou, Y., & Yu, P. S. (2012). Coupled behavior analysis with applications. IEEE Transactions on Knowledge and Data Engineering, 24(8), 1378–1392. Chau, M., & Xu, J. J. (2012). Business intelligence in blogs: Understanding consumer interactions and communities. MIS Quarterly, 36(4), 1189–1216. Chen, T., Tang, L., Liu, Q., Yang, D., Xie, S., Cao, X., et al. (2012). Combining factorization model and additive forest for collaborative followee recommendation. Beijing, China: KDD-Cup Workshop https://kaggle2.blob.core.windows.net/ competitions/kddcup2012/2748/media/SJTU.pdf.

Chen, W. -C., Hsu, C. -C., & Hsu, J. -N. (2011). Optimal selection of potential customer range through the union sequential pattern by using a response model. Expert Systems with Applications, 38(6), 7451–7461. Chen, Y., Liu, Z., Ji, D., Xin, Y., Wang, W., Yao, L., et al. (2012). Context-aware ensemble of multifaceted factorization models for recommendation prediction in social networks. Beijing, China: KDD-Cup Workshop https://kaggle2.blob.core.windows. net/competitions/kddcup2012/2748/media/Shanda3.pdf. Chen, Z. Y., Fan, Z. P., & Sun, M. (2012). A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data. European Journal of Operational Research, 223(2), 461–472. Cheung, C. M. K., & Thadani, D. R. (2012). The impact of electronic word-of-mouth communication: A literature analysis and integrative model. Decision Support Systems, 54(1), 461–470. Cho, Y. B., Cho, Y. H., & Kim, S. H. (2005). Mining changes in customer buying behavior for collaborative recommendations. Expert Systems with Applications, 28(2), 359–369. Choi, K., Yoo, D., Kim, G., & Suh, Y. (2012). A hybrid online-product recommendation system: Combining implicit rating-based collaborative filtering and sequential pattern analysis. Electronic Commerce Research and Applications, 11(4), 309–317. Crone, S. F., Lessmann, S., & Stahlbock, R. (2006). The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. European Journal of Operational Research, 173(3), 781–800. Cui, G., Wong, M. L., & Lui, H. -K. (2006). Machine learning for direct marketing response models: Bayesian networks with evolutionary programming. Management Science, 52(4), 597–612. Cui, G., Wong, M. L., & Zhang, G. (2010). Bayesian variable selection for binary response models and direct marketing forecasting. Expert Systems with Applications, 37(12), 7656–7662. Dellarocas, C. (2003). The digitization of word of mouth: Promise and challenges of online feedback mechanisms. Management Science, 49(10), 1407–1424. Ha, K., Cho, S., & MacLachlan, D. (2005). Response models based on bagging neural networks. Journal of Interactive Marketing, 19(1), 17–30. Hill, S., Provost, F., & Volinsky, C. (2006). Network-based marketing: Identifying likely adopters via consumer networks. Statistical Science, 21(2), 256–276. Huang, C. -L., & Huang, W. -L. (2009). Handling sequential pattern decay: Developing a two-stage collaborative recommender system. Electronic Commerce Research and Applications, 8(3), 117–129. Kang, P., Cho, S., & MacLachlan, D. L. (2012). Improved response modeling based on clustering, under-sampling, and ensemble. Expert Systems with Applications, 39(8), 6738–6753. Kaplan, A. M., & Haenlein, M. (2010). Users of the world, unite! the challenges and opportunities of social media. Business Horizons, 53(1), 59–68. Kim, Y., Street, W. N., Russell, G. J., & Menczer, F. (2005). Customer targeting: A neural network approach guided by genetic algorithms. Management Science, 51(2), 264–276. Kolda, T., & Bader, B. (2009). Tensor decompositions and applications. SIAM Review, 51(3), 455–500. Lahbib, D., Boulle, M., & Laurent, D. (2014). Supervised pre-processing of numerical variables for multi-relational data mining. Studies in Computational Intelligence, 527, 95–109. Lee, H., Shin, H., Hwang, S., Cho, S., & MacLachlan, D. (2010). Semi-supervised response modeling. Journal of Interactive Marketing, 24(1), 42–54. Lessmann, S., & Voß, S. (2008). Supervised classification for decision support in customer relationship management. In A. Bortfeldt, J. Homberger, H. Kopfer, G. Pankratz, & R. Stangmeier (Eds.), Intelligent decision support (pp. 231–253). Wiesbaden: Gabler. Lessmann, S., & Voß, S. (2009). A reference model for customer-centric data mining with support vector machines. European Journal of Operational Research, 199(2), 520–530. Li, Y. -M., & Li, T. -Y. (2013). Deriving market intelligence from Microblogs. Decision Support Systems, 55(1), 206–217. Liu, D. -R., Lai, C. -H., & Lee, W. -J. (2009). A hybrid of sequential rules and collaborative filtering for product recommendation. Information Sciences, 179(20), 3505–3519. Macskassy, S. A., & Provost, F. (2007). Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research, 8, 935–983. Mangold, W. G., & Faulds, D. J. (2009). Social media: The new hybrid element of the promotion mix. Business Horizons, 52(4), 357–365. Min, S. H., & Han, I. (2005). Detection of the customer time-variant pattern for improving recommender systems. Expert Systems with Applications, 28(2), 189–199. Paleologo, G., Elisseeff, A., & Antonini, G. (2010). Subagging for credit scoring models. European Journal of Operational Research, 201(2), 490–499. Park, Y. -J., & Chang, K. -N. (2009). Individual and group behavior-based customer profile model for personalized product recommendation. Expert Systems with Applications, 36(2), 1932–1939. Polikar, P. (2006). Ensemble based systems in decision making. IEEE Circuits and System Magazine, 6(3), 21–45. Power, D. J., & Phillips-Wren, G. (2011). Impact of social media and Web 2.0 on decisionmaking. Journal of Decision Systems, 20(3), 249–261. Prinzie, A., & Van den Poel, D. (2006). Investigating purchasing sequence patterns for financial services using Markov, MTD and MTDg models. European Journal of Operational Research, 170(3), 710–734. Prinzie, A., & Van den Poel, D. (2007). Predicting home-appliance acquisition sequences: Markov/Markov for discrimination and survival analysis for modelling sequential information in NPTB models. Decision Support Systems, 44(1), 28–45.

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008

JID: EOR

ARTICLE IN PRESS Z.-Y. Chen et al. / European Journal of Operational Research 000 (2014) 1–13

Prinzie, A., & Van den Poel, D. (2011). Modeling complex longitudinal consumer behavior with dynamic Bayesian networks: An acquisition pattern analysis application. Journal of Intelligent Information System, 36(3), 283–304. Sarawagi, S. (2007). Information extraction. Foundations and Trends in Databases, 1, 261–377. van Doorn, J., Lemon, K., Mittal, V., Nass, S., Pick, D., Pimer, P., et al. (2010). Customer engagement behavior: Theoretical foundations and research directions. Journal of Service Research, 13(3), 253–266. Vapnik, V. N. (1998). Statistic learning theory. New York: Wiley.

[m5G;October 13, 2014;17:32] 13

Verbeke, W., Dejaeger, K., Martens, D., Hur, J., & Baesens, B. (2012). New insights into churn prediction in the telecommunication sector: A profit driven data mining approach. European Journal of Operational Research, 218(1), 211–229. Zhang, X., Zhu, J., Xu, S., & Wan, Y. (2012). Predicting customer churn through interpersonal influence. Knowledge-Based Systems, 28, 97–104. Zhao, X. (2012). Scorecard with latent factor models for user follow prediction problem. Beijing, China: KDD-Cup Workshop https://kaggle2.blob.core.windows.net/ competitions/kddcup2012/2748/media/FICO.pdf.

Please cite this article as: Z.-Y. Chen et al., Behavior-aware user response modeling in social media: Learning from diverse heterogeneous data, European Journal of Operational Research (2014), http://dx.doi.org/10.1016/j.ejor.2014.09.008