Exploiting concept drift to predict popularity of social multimedia in microblogs

Exploiting concept drift to predict popularity of social multimedia in microblogs

Information Sciences 339 (2016) 310–331 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins...

4MB Sizes 0 Downloads 36 Views

Information Sciences 339 (2016) 310–331

Contents lists available at ScienceDirect

Information Sciences journal homepage: www.elsevier.com/locate/ins

Exploiting concept drift to predict popularity of social multimedia in microblogs Cheng-Te Li a,∗, Man-Kwan Shan b, Shih-Hong Jheng b, Kuan-Ching Chou b a b

Research Center for IT Innovation, Academia Sinica, Taipei 115, Taiwan Department of Computer Science, National Chengchi University, Taipei 106, Taiwan

a r t i c l e

i n f o

Article history: Received 10 December 2013 Revised 28 November 2015 Accepted 1 January 2016 Available online 7 January 2016 Keywords: Popularity prediction Social multimedia Concept drift Information diffusion Microblog social network

a b s t r a c t Microblogging services such as Twitter and Plurk allow users to easily access and share different types of social multimedia (e.g. images and videos) in the cyber world. However, the massive amount of information available causes information overload, which prevents users from quickly accessing popular and important digital content. This paper studies the problem of predicting the popularity of social multimedia content embedded in short microblog messages. A property of social multimedia is that it can be continuously re-shared, thus its popularity may revive or evolve over time. We exploit the idea of concept drift to capture this property. We formulate the problem using a classification-based approach and propose to tackle two tasks, re-share classification and popularity score classification. Two categories of features are devised and extracted, including information diffusion and explicit multimedia meta information. We develop a concept drift-based popularity predictor by ensembling multiple trained classifiers from social multimedia instances in different time intervals. The key idea lies in dynamically determining the ensemble weights of classifiers. Experiments conducted on Plurk and Twitter datasets show the high accuracy of the popularity classification and the results on detecting popular social multimedia are promising. © 2016 Elsevier Inc. All rights reserved.

1. Introduction Microblogging services, such as Facebook, Twitter, and Plurk, are platforms that allow users to share quick, short messages with friends. Such information can quick spread in the cyber world. Various types of digital content can be easily embedded in microblog messages and be shared between users. Social multimedia, such as images and videos, is one of the most accessible and wide-spread digital content available in microblogs. With interactive functionalities on microblogging services, including endorse, re-share, comment, and rate, social multimedia can gain exposure and popularity as more users positively acknowledge it. These functionalities facilitate the communication between users and help users access novel information in a timely manner. However, these social functionalities can also lead to information overload. With large volumes of messages appearing on the personal pages of users in a short time period, users tend to overlook important messages. This information overload may prevent users from receiving popular and important social multimedia quickly, which



Corresponding author. Tel.: +886-2-2787-2370. E-mail addresses: [email protected], reliefl[email protected] (C.-T. Li), [email protected] (M.-K. Shan), [email protected] (S.-H. Jheng), [email protected] (K.-C. Chou). http://dx.doi.org/10.1016/j.ins.2016.01.009 0020-0255/© 2016 Elsevier Inc. All rights reserved.

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

311

Fig. 1. Sequences of popularity for six YouTube videos.

especially affects users with lots of friends. Therefore, it is useful and appreciated to recommend popular social multimedia to such users so that they can quickly catch on to current trends and follow new users. To discover popular social multimedia for recommendation, we propose to predict the popularity and spread of social multimedia embedded in a microblog social network. We resort to the following features of microblogs to formulate the problem and the solution. First, microblogs provide friendly user interfaces for real-time message posting. This real-time property enhances the visibility of information and provides the potential to quickly accumulate view counts for social multimedia embedded in messages. Second, the content of microblog posts tends to be conversation-based, with a sequence of responses. Conversations can boost the vitality of social multimedia through interactions between users. Third, users of a microblog service are connected by an underlying social network. Social multimedia embedded in messages posted by users can be viewed by their friends. Therefore, social multimedia can have an implicit or explicit impact on more users as they re-share such content. Last but not least, messages containing the same social multimedia can be regenerated and shared by users over time under different uncertain contexts. Each social multimedia content can (1) be instantly and widely spread in a very short time period (as shown in Fig. 1(b) and (d)), (2) remain stable in terms of popularity for a long period (as shown in Fig. 1(e) and (f)), (3) accumulate or dissipate its popularity gradually (as shown in Fig. 1(a) and (b)), and/or (4) alternate between being popular and unpopular (as shown in Fig. 1(c)). We will elaborate on the details of each scenario in the following sections. The importance of understanding the popularity of online social multimedia is four-fold. First, for network and cloud service providers, accurate prediction of popularity at an early stage can help with planning sufficient storage and computation resources and reserving adequate bandwidth in advance to handle real-time streaming requests. The service providers can save resources while delivering smooth and high-quality multimedia contents. Second, for content producers, understanding currently trending topics is essential to developing content and diffusion strategies of new social multimedia. A successful social multimedia campaign attracts a large number of viewers, which directly affects profit, fame, and social influence. Third, for content consumers, they can receive accurate recommendations on trending multimedia items at early stages. Fourth, for advertisers, high popularity of social multimedia contents corresponds to high revenue. Finding popular multimedia contents in an accurate and prompt manner leads to more effective and efficient targeted marketing that improves profitability with lower costs. Several studies have tried to predict the popularity of messages (i.e., meme-styled short texts) in microblogs. Diverse features have been explored, including LDA topical features and social network features [18], the effect of early adopters [3], the subjectivity of the message language [5], sentiment features [32], and the temporal patterns of the popularity evolution [1]. However, there exists a fundamental difference between predicting the popularity of messages and social multimedia. The life cycle of short messages on microblogs (e.g. retweeted messages in Twitter) usually lasts for 1–2 weeks [49] because the real-time property of short messaging leads to a huge volume of information, while novel events are being created daily. However, social multimedia (e.g. images and videos) embedded in microblog messages exhibit a longer life cycle since these digital contents are usually hosted on external media-sharing platforms (e.g. YouTube and Flickr), where media contents are allowed to be repeatedly searched and shared over time. Such fact implies that, instead of bursting onto the scene and cause an instant revolution, the popularity of social multimedia may be revived a long time after its initial publication, thus making popularity prediction of social multimedia challenging. Our goal is to predict the popularity of social multimedia in a microblog social network. Images and YouTube videos embedded in messages are considered as the targeted social multimedia in this study. Given a social multimedia, which is embedded in a message posted in a certain time interval t, with all the relevant information before t, we aim to exploit the machine learning technique to predict the popularity of the targeted social multimedia at time interval t + 1. To facilitate the prediction task, we define the popularity of a social multimedia in terms of time intervals. We treat the popularity prediction as a classification problem, which can be divided into two prediction tasks, re-share classification and popularity

312

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

score classification. The former is a binary classification and aims to predict whether or not a social multimedia will be reshared in the future. The latter is a multi-class classification and aims to predict the popularity score of the targeted social multimedia in the future. The most intuitive approach of popularity prediction is to naively predict the exact popularity score using a regression model. Existing methods [1,33,36] build regression models by exploiting the historical view count of an item for popularity prediction. However, such information is only available to the multimedia service providers (e.g. YouTube and Flickr). Microblogging services (e.g. Twitter and Facebook) do not have direct access to the historical view counts of the embedded social multimedia. To get around this problem, we can crawl the multimedia sharing sites and record the history of view counts of multimedia made available on YouTube and Flickr. However, such method is expensive and fails to work for newly published multimedia item which has very limited history. Hence in this paper, we aim to take advantage of the propagation of multimedia items in microblog social networks to predict the popularity of multimedia items by formulating the task as a classification problem. Another reason to adopt the classification nature to solve the problem is that end users only care about whether or not a multimedia content will become hot and trendy—not the exact popularity score. Therefore, we believe that categorizing the popularity for prediction can satisfy users’ needs. The central idea of our solution for the proposed prediction tasks is that concept drift happens throughout the evolution of multimedia popularity. In other words, the popularity of a social multimedia may depend on some hidden contexts. We can specifically elaborate such fact with the following two points. First, for a social multimedia m, the explicit predictive features of m remain the same as time pasts, but m s popularity could fluctuate with different evolution patterns (e.g. sudden burst or curtail, periodic/gradual growth or recession, and so on). This phenomenon may be caused by the changes of the hidden context, and such change will lead to variation of popularity. Second, in a certain time interval, some social multimedia might be affected by the hidden context while others might not. For example, holiday songs in YouTube are dramatically more popular during Christmas holidays, while many other songs do not show such pattern. Therefore, instead of building one classifier with all the training instances at a time [3,5,18,48], which does not consider the idea of concept drift of popularity, we devise a concept drift-based predictor. The proposed predictor aggregates multiple classifiers trained with instances from different time intervals. As an unseen social multimedia instance arrives, we will dynamically determine the ensemble weights for the trained classifiers to predict the popularity of the unseen instance. In Section 4, we will take advantage of the idea of concept drift [41] to develop our predictor, following the problem defined in Section 3. Sections 5 and 6 will present the experimental results and conclude this paper respectively. We summarize the contributions of this paper as follows: • Conceptually, we propose a framework to predict the popularity of social multimedia and point out the difficulties of this problem. We consider the relationships among social media instances, microblog messages, and users to define the problem. Two classification problems, Re-share and Popularity Score, are tackled to fulfill the popularity prediction. • Technically, we develop a concept drift-based popularity prediction mechanism. By first training diverse classifiers with social multimedia instances in different time intervals and then dynamically determining the weights of classifiers, we are able to build an ensemble classifier that effectively captures the potential concept drift of each unseen instance. • Empirically, we conduct experiments on the social network data of Plurk and Twitter. The results demonstrate promising performance of the proposed concept drift-based popularity predictor, comparing to other well-regarded baseline methods. Such results prove the effectiveness of leveraging the idea of concept drift to capture the popularity evolution and encourage the consideration of such information for future popularity analytics on social media. 2. Related work Existing studies on the popularity analytics of online social media can be divided into two categories, popularity prediction in microblogs and popularity analysis in media-sharing services. The former targets at measuring and predicting the popularity of text messages and topics in microblogs, such as Twitter and Plurk. The latter aims to analyze the popularity of different media contents, such as photos in Flickr, bookmarks in Digg, and videos in YouTube. 2.1. Message popularity prediction in microblogs Hong et al. [18] is the first to predict the popularity of text messages in microblogs. They define popularity as the number of retweets of a message (i.e., the number of messages that quotes a particular message) in Twitter. They tackle the prediction problem by first casting it into a classification problem, then categorizing the popularity values to different levels and consider each message as an independent instance for prediction. However, they ignore an important fact: the messages retweeted from one time stamp to another are temporally correlated. In other words, the popularity of a message instance retweeted today might be significantly correlated to the popularity of the same message retweeted yesterday, but it may not be related to the popularity of the same message retweeted last month. Therefore, simply using a single classifier for prediction leads to a loss of temporal correlation between messages. In this work, we instead propose to train multiple classifiers for the prediction of the popularity of social multimedia re-shared at different time stamps. On the other hand, Yang and Counts [48] and Ho et al. [17] assume that information (i.e., topics) can be diffused from one user to another in a microblog social network, and propose to predict the speed, scale, and range of diffused topics.

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

313

Note that a topic in their studies could include several messages from a certain category. Using regression models, they predict the aforementioned three values for eight pre-selected topics. Moreover, diverse kinds of features are explored to predict the popularity of each message, including the effect of early adopter [3], the subjectivity of the message language [5], the sentiment features [32], and the temporal patterns of popularity evolution [1]. As for predicting social multimedia popularity, we are the first to attempt such task. 2.2. Popularity in media-sharing services Existing studies on media-sharing services consider the video view count (for YouTube), the voting count (for Digg), and the number of people who likes a photo (for Flickr) as the popularity measure. Szabo and Huberman [38] found that the popularity of videos and news at the early-posted periods has strong correlations to those at the later periods. They devise a linear regression model for the popularity prediction. However, the information that describes the social multimedia is not being considered. In other words, different kinds of social multimedia are regarded as the same instances for prediction. Zhou et al. [53] analyze how the video popularity changes over time for different types of videos in video-on-demand services. Lerman and Hogg [27] study the interactions between users and news contents to devise a stochastic model that predicts the popularity of news in Digg. By considering the sharing patterns of uploaded and downloaded social multimedia as the implicit vote for (or against) the subject, along with multimedia content features, Jin et al. [21] forecast the popularity of photos in Flickr. Figueiredo et al. [12] analyze the growth patterns of photo popularity by characterizing a series of meta information in Flickr. Cha et al. [8] conduct a trace-driven analysis on the popularity distributions of videos on YouTube. Vallet et al. [42] analyze the correlation between textual features of tweets and view counts for YouTube videos, and make a cross-system popularity prediction. Xu et al. [47] alternatively consider the contextual and situational information, which could be the social network structure among users who share the sane video, to make the popularity prediction. In addition, Vasconelos et al. [43] extract geographical and user rating features to predict the popularity of micro-reviews (e.g. Foursquare tips). Though existing studies have successfully analyzed the popularity of online multimedia, we have not seen research that considers both information diffusion and social connections when predicting the popularity of social multimedia in microblogs. This may be due to the difficulty of obtaining diffusion records and the social network in media-sharing services. 2.3. Approaches to predict popularity We categorize the approaches of popularity prediction into three types: (1) regression-based, (2) classification-based, and (3) model-based. In the following we elaborate the relevant studies and discuss the scenarios and limitations of each type. 2.3.1. Regression-based approach The regression-based methods highly depend on the historical popularity of an item to predict its future popularity. If the evolution of the popularity values over time are available, the regression-based approach is very effective. Szabo and Huberman [36] propose a univariate linear regression model to find the correlation between the historical popularity with the logarithmic form and the current popularity. Pinto et al. [33] further extend the univariate model to a multivariate version that significantly boosts the prediction performance. Lerman and Hogg [27] also extend the univariate regression model by considering how people vote and rate news articles. Li et al. [26] examine the performance of autoregressive integrated moving average, multiple linear regression, and k-nearest neighbor regression for popularity prediction. In addition, Radinsky et al. [34] exploit the state space model from the system control field to capture the temporal patterns of popularity (including smoothness, local trend, and periodicity), and predict popularity through a series of first-order differential equations. Zadeh and Sharda [51] model the evolution of the brand post popularity values using multi-dimensional Hawkes point process models. Ding et al. [9] further develop a Dual Sentimental Hawkes Process (DSHP) to deal with non-linear correlation between view counts of videos. Given a tweet post’s resharing history so far, Zhao et al. [52] and Bao et al. [4] take advantage of a self-exciting point process, which models the probability of future resharing among users, to predict its final popularity. Xu et al. [46] exploit the mutually exciting point processes with a hierarchical Bayesian method to model various types of advertisement clicks and purchases over time. Considering retweeting dynamics may vary for different topics, Gao et al. [14] propose a reinforced Poisson model with Power-law relation, exponential reinforcement, and time mapping (PETM) to have higher accuracy of popularity prediction. Moreover, Khosla et al. [22] apply the support vector regression (SVR) with low-level visual features of images to predict their popularity. Ahemd et al. [1] cluster the evolution of popularity into several stages based on a set of pre-defined time windows and features, and model the growth patterns of popularity as transitions between stages. They classify the popularity of an item by finding a path of stages that maximizes the likelihood of its popularity history. These regression-based methods take the historical popularity values of a content, which can only be accessed by the original media service providers (e.g. YouTube and Flickr), as the input. However, in practice, third parties (e.g. Twitter and Facebook, where the microblog messages are posted) cannot directly obtain the historical value of social multimedia. 2.3.2. Classification-based approach This approach casts the popularity prediction problem as a classification problem by dividing popularity into several discrete degrees, and regarding these popularity degrees as class labels. Hong et al. [18] consider popularity prediction

314

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

Table 1 Summary and comparison between this paper and the previous relevant studies. Approach

Related work

Regression-based

Univariate [27,36] Multivariate [26,33] kNN [26] State/patterns [1,34] Point process [46,51] SVR [22]

Model-based

Classification-based

Ranking [16,50] Learning to rank [39] SoVP [26] Survival analysis [25] HMM [20] Logistic regression [18] Trend discovery [11] Linguistic classifier [44] Attention predictor [24] K-spectral centroid [49] This work

Historical popularity

Content info.

√ √ √ √ √ √

√ √ √

√ √ √ √ √ √ √ √ √ √

Viewer info.

Popularity patterns

Social network

Information diffusion

Comments/ discussion

√ √ √ √

√ √ √ √ √ √ √

√ √ √

√ √ √

√ √

√ √

√ √ √

√ √



as a binary classification problem that predicts whether or not a message will be retweeted, and a multi-class classification problem that predicts the level of popularity a message will obtain. Wang et al. [44] explore the purely textual and linguistic features of tweet contents, and apply the models of decision tree and support vector machine (SVM) to predict the number of retweets. No social and diffusion effects of a tweet are considered. Lakkaraju and Ajmera [24] also use SVM to classify the level of a post’s attention (i.e., very less, less, mediocre, high, and very high), which is defined by the number of comments for a certain post in a brand’s page. Their settings focus on the characteristics of a brand and do not incorporate the spread of a post within a social network. On the other hand, Figueiredo [11] aims at predicting the trends/patterns of a post, which are defined by the historical view counts. The trends/patterns are discovered by time-series clustering algorithms using extremely randomized ensemble trees. It is apparent that their goal is different from ours. Similarly, Yang and Leskovec [49] develop the K-Spectral Centroid (K-SC) clustering algorithm with logistic regression models that classify the given time series of popularity into diverse shapes of temporal patterns. However, the method proposed by Yang and Leskovec [49] is to categorize the time series of popularity values, not to predict the future popularity values in a microblogging platform. In short, owing to that the goals of these classification-based approaches are different from ours (we aim at predicting future popularity under the setting of information diffusion), we are unable to consider them as the compared methods. Nevertheless, there are still some studies that use conventional classification methods, SVM [2,5,30,44], decision tree [2,5,44], and support vector regression [29,40], for predicting the future popularity of an item. In our experiments, we will consider these methods as the competitors (the details are described in Section 5.3), and compare our proposed method with theirs. 2.3.3. Model-based approaches This approach designs and exploits sophisticated models, such as survival analysis, content propagation, hidden Markov model, and Bayesian learning, to obtain deeper knowledge on tackling the popularity prediction problem. Lee et al. [25] take advantage of the technique of survival analysis to estimate the probability that a given Web content will get more than a certain number of hits. Li et al. [26] devise a propagation-based prediction framework, which models how friends affect the willingness to viewing and sharing video items in the social network, to predict the number of future views of a video. Jiang et al. [20] aim to predict the date a video can get its peak view count based on its daily view pattern. A modified hidden Markov model (HMM) is employed to realize the prediction. Yin et al. [50] transform the popularity prediction model into a ranking problem, and consider the voting behaviors of users to develop a Bayesian model for the ranking-based prediction. Then He et al. [16] further propose a bipartite user-item ranking (BUIR) model to predict the relative ranks of items based on the features extracted from user comments of the items. Also following a similar ranking setting, Tatar et al. [39] predict the popularity of news articles based on user comments. In short, the model-based approach for the task of popularity prediction is usually designed to capture various human behaviors in the presence of media contents, and thus needs specific kinds of data. For example, He et al. [16] and Tatar et al. [39] need user comments for the items, Li et al. [26] need the information of video views, and Lee et al. [25] need textual discussion threads. However, in realworld usages, accessing these specific kinds of data could be either infeasible or incomplete for various media contents. Therefore, we resort to casting the popularity prediction problem into a classification problem, in which the popularity can be derived from the spread of media contents in a social network. In other words, our work leverages the diffusion features of social multimedia to build classifiers to predict its popularity with no external data resources required, while model-based approaches need external information for modeling specific kinds of human behaviors on the evolution of popularity. We summarize the relevant studies described above in Table 1. In this table, we compare three approaches, regressionbased, model-based, and classification-based, and present various kinds of information utilized to predict the popularity. To

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

315

Fig. 2. Four typical types of concept drift.

understand the position between this work and previous studies, we also highlight this work, as shown in the last row. In summary, (a) what we adopt is the classification-based approach, (b) our model is content-independent, and (c) we are the first to exploit information diffusion features to predict item popularity. 2.4. Concept drift-based classification The idea of concept drift [19] was formally introduced by Tsymbal [37] to capture the fact that concepts might change over time in different real-world applications. For example, in daily weather prediction, we cannot use the same rules to predict temperatures in summer as we do for winter. Also, for predicting the profits of product sales, certain models lead to very low accuracy since the purchasing behavior of users can be affected by holidays and seasons. Even though we know the rules can vary as time pasts, we do not have the knowledge about the exact time that a certain concept changes. Žliobaite˙ [54] proposes an incremental learning framework based on concept drift. Assume a sequence of instances (X1 , . . . , Xt ) is observed, with instance Xi labeled as yi . Each instance Xi is generated by a source Si . As the new instance Xt+1 arrives, we aim to train a classifier Lt to predict its label yt+1 using the training data, i.e., the entire or partial set of (X1 , . . . , Xt ). Moreover, as yt+1 is predicted and Xt+1 becomes available as training data, we proceed to build a new classifier Lt+1 to predict the label of next incoming instance Xt+2 . Under such circumstances, if all instances are sampled from the same source, i.e., S1 = S2 = . . . = St+1 = S, the concept is said to be stable. If Si = S j for any two time points, there is a concept drift. Consequently, the problem lies in how to devise a learning mechanism to accurately predict the label of the unseen instance Xt+1 considering that the concepts in the training data (X1 , . . . , Xt ) might change over time. To effectively consider concept drift in the training model, Bifet et al. [6] and Gama et al. [19] summarize four typical types of drift. The four drift types are shown in Fig. 2, in which a cylinder represents an instance and colors indicate sources. Sudden drift means that a source St is suddenly changed to St+1 = St . Gradual drift indicates there exists a time period when both sources, say S1 and S2 , coexist in a time period. Incremental drift is similar to gradual drift but it takes longer to evolve and involves more than two sources, in which the differences between the sources are very small. Reoccurring drift refers to a scenario where a previously active concept reappears after some time. Bifet et al. [6] also point out that a specific prediction mechanism is needed for each drift type. For example, a single classifier is suitable to deal with sudden drift. As shown in Fig. 2, if we know the timing of sudden drift, we can train a new classifier Classi f ier2 to predict the class labels of the next unseen instances, instead of using the old Classi f ier1 trained for instances before sudden drift occurs. On the other hand, incremental drift requires to train and ensemble multiple classifiers. For example, in Fig. 2, we can train a classifier for all the three instances during the drift. Then we can predict the labels of unseen instances by ensembling classifiers, in which newer/older classifiers are given higher/lower weights. Note that the goal of using these four types of popularity evolution is to illustrate the idea of possible changes of popularity incurred by concept drift. The real concept drifting goes beyond these four types of popularity evolution; therefore, we use a supervised learning approach to derive a model that best describes the concept drift that affects the change of popularity of the targeted multimedia content. Specifically, in our model, each classifier is trained using instances that are grouped based on whether or not they belong to the same time interval, instead of instances that are grouped by types of popularity evolution. To build the final ensemble classifier, we weigh the classifiers based on the prediction accuracy of test instances that are most similar to the unseen instance. 3. Problem definition Before presenting the popularity prediction model for microblogging services, we first describe the relationship between social multimedia and microblog messages. Then, we formally define the social multimedia popularity prediction problem. We denote the universal set of social multimedia as M. Two categories of social multimedia, video and image, denoted by MV and MI , are considered in this study. Both MV and MI are extracted from the messages posted in a microblog service. We use M ( p) to represent the set of social multimedia contained in a microblog message p posted by a user u. A set of messages P is posted by a set of users U. Eventually, for a social multimedia m, we can find the corresponding set P (m )

316

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

Fig. 3. Illustration of social multimedia, posted messages, and users in a microblog service.

of messages containing m, P (m ) = { p|m ∈ M ( p)}. In addition, we can also find the set U (m ) of users respective to a social multimedia m, U (m ) = {u|u posts p, p ∈ P (m )}. A social multimedia content can be shared from one user to another. As time proceeds, for a social multimedia m that are shared among users, more messages containing m will be generated, i.e., be added into P (m ). To facilitate the prediction of social multimedia popularity, we partition the sets P (m ) and U (m ) into several disjoint subsets based on time interval. Specifically, we assume that the time period of the entire collected messages ranges from t = 1 to t = T . For each social multimedia m, we segment the entire time period of its existence into T time intervals, denoted by T R, in which the tth time interval is represented by T Rt . A social multimedia m which appears in T Rt is denoted by mt . Each social multimedia mt appearing in time interval T Rt has a corresponding message set P (mt ) and user set U (mt ). We can use Fig. 3 to better understand the aforementioned notations. Now, we will define the popularity of social multimedia in a microblog. The social multimedia popularity is defined for a social multimedia m with respect to time interval T Rt . Given T Rt , we define the popularity of a social multimedia mt as the number of users who post messages that contains mt , i.e., |U (mt )|. In other words, if social multimedia mt is shared and embedded in many messages posted by different users in a certain time interval T Rt , it will be regarded as more popular than instances in other time intervals T Rk , 1 ≤ k ≤ T and k = t. For example, in Fig. 3, the popularity scores of social multimedia m for time intervals T R1 and T R2 are |U (m1 )| = 2 and |U (m2 )| = 3, respectively. Equipped with these definitions, we can formally describe the proposed social multimedia popularity prediction. We aim to predict the popularity of a given social multimedia by considering the following two classification problems. The first is re-share classification and the second is popularity score classification. Given a social multimedia mt , representing m in time interval T Rt , re-share classification is a binary classification which aims to predict whether or not m will be re-shared in time interval T Rt+1 . Popularity score classification is a multi-class classification that aims at predicting the score of social multimedia popularity |U (mt+1 )| in time interval T Rt+1 .

4. Proposed method We aim to predict the popularity of social multimedia by tackling the re-share classification problem and the popularity score classification problem. Since we are facing classification problems, we need to clearly answer the following three questions: (a) What are the class labels? This will be described with dataset used in Section 5. (b) What are the features we intend to extract for prediction? We will give the extracted features in Section 4.1. (c) What is the prediction/classification model developed? We will elaborate the details in Section 4.1.

4.1. Feature extraction The features used in the prediction task consist of two main categories, the information diffusion feature and the explicit multimedia meta information feature. Fig. 4 shows an overview of both features. For an input social multimedia m in time interval T Rt , we can extract its explicit meta information as features from the media-sharing services (e.g. video’s explicit meta info from YouTube). Then the set of messages P (m ) containing m in T Rt , along with the set of users U (m ) who posts P (m ), as well as the underlying social network, constitute to the category of information diffusion features.

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

317

Fig. 4. Overview of extracted features.

4.1.1. Information diffusion features The purpose of including information diffusion features is to capture the sharing behaviors of the input social multimedia between messages, between users, and between users and messages in the microblog social network. Thus we further divide this feature category into three subsets: user features, message features, and user-message interaction features. 4.1.1.1. User features—social authority. User features aim to capture how a social multimedia is propagated between users within a social network. We consider that if a social multimedia mt is posted or re-shared by an authoritative user (there might exist some important users in the user set U (mt )), this social multimedia will have a higher potential of being reshared and becoming popular in the near future. To derive the authority score of a user, we compute the PageRank score for all users in the entire social network. Each user u ∈ U (mt ) has a PageRank score, denoted as PR(u ). We calculate four social-authority feature values for social multimedia mt , as follows: • Maximum PageRank in U (mt ): MAXu∈U (mt ) PR(u ) • Minimum PageRank in U (mt ): MINu∈U (mt ) PR(u )  • Average PageRank in U (mt ): ( u∈U (mt ) PR(u ))/|U (mt )|  • Top-k average PageRank in U (mt ): ( u∈U k (m ) PR(u ))/|U k (mt )| ), where U k represents the users with top-k PageRank t scores. We choose to set k = 10 in this study. 4.1.1.2. User features—propagation capability. In addition to social authority, we consider that a user who has a strong capability of spreading information to others is more influential in the process of social multimedia propagation. If a social multimedia is diffused to more of these influential users, it tends to become more popular. To model such characteristic of information propagation as features, we need to define the propagation between users in advance. Given a social multimedia mt shared in time interval T Rt and the corresponding user set U (mt ), we define that mt is propagated from user u ∈ U (mt ) to user v ∈ U (mt ), u = v, denoted as d (u, mt , v ), if u and v are friends in the social network and v re-shares or comments on the message containing mt , which had been shared by u in time interval T Rt . Based on this definition of propagation, we can further define the propagation graph to represent the diffusion of social multimedia mt over the user set U (mt ). The propagation graph is defined as a directed graph H (mt ) = (U (mt ), D(mt )), where D(mt ) is the set of directed propagation that links d (u, mt , v ), u, v ∈ U (mt ) and u = v. In addition, if a user comments on multiple messages containing mt , only the message with the earliest time stamp in T Rt is used to construct the propagation graph. Therefore, the propagation graph is an acyclic graph. We use a simple example in Figs. 5 and 6 to illustrate the construction of a propagation graph. Assuming that U (mt ) = {A, B, C, D, E, F }, in Fig. 5, user A shares a message containing mt which is commented by user B and user C. User B re-shares such message, which is then commented on by users C, D, and E. After all these actions, we can find that mt is propagated from user A to B, from A to C, from B to D, from B to E, and from B to F. Consequently, the propagation graph H (mt ) is constructed, as shown in Fig. 6. Note that we do not construct a propagation link from B to C since C has already been commented on and influenced by the message containing mt from A. To represent the propagation capability of users in U (mt ), we propose three measures, propagation volume, propagation strength, and propagation rate, with each measure possessing its own physical meaning from diverse aspects. To facilitate the definition and calculation of the three measures, we extract the egocentric propagation graph for each user in U (mt ). Specifically, the egocentric propagation graph for u ∈ U (mt ), denoted by H (mt , u ) = (U (mt , u ), D(mt , u )), is defined as the subgraph of H (mt ), where U (mt , u ) and D(mt , u ) are the set of users and edges that messages containing mt flow through

318

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

Fig. 5. A simple example of information propagation.

Fig. 6. The propagation graph constructed from Fig. 5.

starting from user u. For example, in Fig. 6, the entire propagation graph H (mt ) is the egocentric propagation graph of user A, i.e., H (mt , A ). H (mt , B ) is the subgraph induced by nodes {B, D, E, F }. H (mt , C ) contains only node C since user C does not re-share the message containing mt from user A. The following three measures are defined on the egocentric propagation graph: • Propagation volume aims to estimate the amount of users who are influenced by user u directly or indirectly with social multimedia mt . We define the propagation volume of user u, denoted by PVol (mt , u ), by the size of the node set of u’s egocentric propagation graph H (mt , u ), i.e., PVol (mt , u ) = |U (mt , u )|. For example, in Fig. 6, PVol (mt , A ) = 6, PVol (mt , B ) = 4, PVol (mt , D ) = 1. • Propagation strength is devised to measure how far social multimedia mt can reach starting from user u. We define the propagation strength of user u, denoted by PStr (mt , u ), by the maximum length of propagation path starting from u to other nodes in the egocentric propagation graph H (mt , u ). The propagation strength can be computed by the formula

PStr (mt , u ) = MAXv∈U (mt ,u) dist (u, v ), where dist (u, v ) is the length of the shortest path between node u and v in H (mt , u ). For example, in Fig. 6, PStr (mt , A ) = 2, PStr (mt , B ) = 1, and PStr (mt , C ) = 0. • Propagation rate is to capture the growth speed of the amount of users who are influenced by user u directly or indirectly with social multimedia mt . We define the propagation rate of user u, denoted by PRate(mt , u ), by the volume of influenced users per level traversed from u in its egocentric propagation graph H (mt , u ). Specifically, we can represent the propagation rate using the formula



PRate(mt , u ) =

l=PStr (mt ,u ) l=1

PVoll (mt , u )

PStr (mt , u )



,

where PVoll (mt , u ) is the number of users influenced by user u at lth level traversed from u in H (mt , u ). Note that if PStr (mt , u ) = 0, we consider PRate(mt , u ) = 0. For example, in Fig. 6, PRate(mt , A ) = (2 + 3 )/2 = 2.5, PRate(mt , B ) = 3/1 = 3, and PRate(mt , D ) = 0.

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

319

Given social multimedia mt and the corresponding user set U (mt ) in T Rt , we can derive the propagation-capability feature values by computing the maximum, minimum, and average values for the aforementioned three measures.  Specifically, nine feature values are obtained, MAXu∈U (mt ) f (u ), MINu∈U (mt ) f (u ), and ( u∈U (mt ) f (u ))/|U (mt )|, where f = {PVol, PStr, PRate}. 4.1.1.3. Message features—text contents. Since each social multimedia is usually accompanied with message information written by users or automatically generated by the microblog platform, we design some message features to capture the description content of the shared social multimedia. We consider the text contents and the temporal information of messages as features. We believe messages can reveal users’ sentiments, the cause of such sentiments, and general discussion of the social multimedia. We first perform word segmentation and exclude stop words for messages in P (mt ) to obtain a set of terms. Then we compute two feature values. The first is the average number of terms of a message in P (mt ). The second is the content cohesiveness, modified from [31]. We represent each message p ∈ P (mt ) as a term TF-IDF vector v( p), where IDF value is derived with respect to all messages in the microblog dataset. Then we can derive the centroid vector vc (mt ) for all terms in P (mt ) by calculating the average TF-IDF scores. Eventually, the content cohesiveness of mt , denoted by CCoh(mt ), is defined by



CCoh(mt ) =

(vc (mt ), v( p)) , |P (mt )|

p∈P (mt ) CosSim

where CosSim() is the cosine similarity. 4.1.1.4. Message features—temporal information. As users post messages containing social multimedia, the microblog platform records the corresponding time stamp. We think such temporal information reveals whether or not a social multimedia will become popular. In other words, if the time difference between two messages containing mt is shorter, mt is believed to be popular. By sorting the messages in P (mt ) in an ascending order by time stamps, we compute the following values as features: (1) time difference between the last and the second-to-last messages, (2) time difference between the last and the first messages, (3) average time difference between all pairs of messages, (4) time difference between the first message and the starting time of T Rt , and (5) time difference between the last message and the end time of T Rt . 4.1.1.5. User-message interaction features. One of the most frequent behaviors in microblogs is that users are allowed to interact with messages through social actions, including comment, re-share, mention, and endorse. We aim to characterize the statistics of these actions as features for prediction. For a social multimedia mt with the corresponding message set P (mt ), we derive the following user-message interaction features, denoted by UMI. UMI is derived by computing the proportion of each of the four social actions over all messages P (mt ), as formulated in the following formula,

UMIaction (mt ) =

#action(P (mt )) , |P (mt )|

where action = {comment, reshare, mention, endorsement }, and #action(P (mt )) represents the number of messages with action in message set P (mt ). 4.1.2. Explicit multimedia meta information features Social multimedia that is being shared and spread in microblogs usually come from external media-sharing services, such as videos from YouTube and images from Flickr. When videos and images were uploaded to such services, some information is attached manually or automatically. Such information is regarded as the explicit multimedia meta features. We extract the following explicit meta info features, upload time, media type (e.g. entertainment, education, music video, etc.), user rating, number of responses, and number of users saving it as favorite. 4.2. Instance composition We describe the composition of a social multimedia instance in time interval T Rt . An instance consists of three parts, the current information diffusion features (in T Rt ), the past information diffusion features (time intervals from T Rt−n to T Rt−1 ), and the explicit multimedia meta information features, as illustrated in Fig. 7. It is worthy to note that since information diffusion is dynamic in essence, how the social multimedia behaves in the past could reflect its popularity tendency in current time. Thus, we consider the previous n time intervals from the current time interval t to capture the past diffusion behaviors of a social multimedia. 4.3. Popularity prediction Different instances of social multimedia have different popularity patterns. We leverage the four typical types of concept drift, introduced in Section 2.4, to represent diverse kinds of popularity evolution in social multimedia. First, a social multimedia might gain sudden popularity, such as videos in Fig. 1(b) and (e). The sudden drift type can model this case. Second, the popularity of a social multimedia may increase gradually with minor fluctuation, such as the video in Fig. 1(a). We can

320

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

Fig. 7. The illustration of features for a social multimedia instance.

Fig. 8. Flowchart of the proposed concept drift-based predictor.

use gradual drift to catch such case. Third, most of social multimedia instances go through time periods when popularity may increase or decrease incrementally, such as many fine-grained periods of videos in Fig. 1. The incremental drift can deal with this case. Fourth, a social multimedia can become quite popular and then lose its popularity periodically, such as the video in Fig. 1(c). The reoccurring drift describes this case. Since there are diverse kinds of concept drift in the evolution of popularity, using a single classifier for popularity prediction is not a good strategy. Even in a certain time interval, some instances may be affected by the hidden context of concept drift while others may not. That is, the concept drift may happen in some instances only. For example, the popularity classifier for a video of a Christmas song should be different between Christmas holidays and workdays. Even during the Christmas holidays, different classifiers should be applied to predict the popularity values of videos relevant and irrelevant to Christmas. Therefore, instances in a time interval should be trained by diverse classifiers. We exploit the idea of local concept drift [41], which aims to deal with the scenario that only some instances are affected by each concept drift, to develop a novel prediction mechanism to tackle these aforementioned issues. We leverage the ensemble approach which has been used for mining concept-drifting data streams [23,35,45]. The key of our method is to dynamically determine the weights of different trained classifiers at the prediction stage to build an ensemble predictor, instead of giving weights at the training and testing stages. The benefit of dynamically allocating the ensemble weights lies in that we can use more suitable classification rules with respect to the unseen instance to predict the popularity, if we consider the features of the unseen instance at the ensemble stage. In other words, for each unseen instance, we will build a customized popularity predictor by finding the set of ensemble weights which best describe the upcoming instance. We use Fig. 8 to elaborate the technical details of the proposed concept drift-based popularity predictor. We divide all the social multimedia instances into three subsets: training, testing, and unseen. At the training stage, we split the training instances into |T R| groups according to the pre-defined length of time interval, |T R| = 5 in this example. We train a classifier for instances in each group. At the prediction stage, as an unseen instance mt arrives, we find top-K similar test instances with respect to mt , K = 3 in this example, based on the extracted feature vectors. These K selected test instances are employed to determine the weights of classifiers to build the ensemble predictor. In this example, classifier C3 will get

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

321

the highest weight because it achieves the highest accuracy when predicting the labels of the three selected test instances. The second highest weight can be allocated to classifier C1 or classifier C4 . Consequently, the final popularity predictor is composed by ensembling these kgr p classifiers with their corresponding weights, and it is used to predict the class label for the unseen instance mt . We believe the most accurate predictor for mt can be built with the weights determined during the prediction stage. What follows describes the details on how the weights of classifiers are derived. First, as an unseen instance mt arrives, we select top-K similar test instances with respect to mt . Since the proposed feature values could be either numerical or categorical, we adopt the Heterogeneous Euclidian/Overlap Metric (HEOM) [38], as given in the following equations. In HEOM, Euclidian distance is used for numerical values while the overlap distance is used for categorical values, to be the distance measure dist (mx , my ) between instances mx and my .



dist (mx , my ) =



heom

|v f | i=1



heom

 vif (mx ), vif (my ) =

2 vif (mx ), vif (my ) ,

⎧ ⎪ 0, if vif (mx ) = vif (my ), ⎪ if categorical, ⎪ ⎨ 1, otherwise.    f  f vi (mx )−vi (my )   , if numerical. range vif

⎪ ⎪ ⎪ ⎩

,

where vi (mx ) is the ith element of mx ’s feature vector, |v f | is the length of the feature vector, and range(vi ) refers to the f

f

f

maximum difference between all pairs of feature values vi (· ). We then define the weights of classifiers for ensemble. If a classifier Ci can accurately predict more class labels for the K selected test instances, Ci will be allocated a higher weight. Therefore, given an unseen social multimedia instance mt , we define the weight of the classifier Ci by the following formula,

K

weightCi (mt ) =

j=1









HITCi m j × σ mt , m j   K j=1 σ mt , m j



,

where m j (1 ≤ j ≤ K ) stands for the selected K most similar testing instances with respect to mt . The function HITCi (m j ) indicates whether or not the classifier Ci accurately predicts the class label of m j . HITCi (m j ) = 1 if ym j = ym j while HITCi (m j ) = −1 if ym j = ym j , where ym j is the real class label of m j and ym j is the predicted label. In addition, σ (mt , m j ) = 1/dist (mt , m j ). Note that we use Random Forest [7] to serve as the classifier.

5. Experiments We conduct experiments to estimate the performance of the proposed concept drift-based predictor for re-shared classification and popularity score classification. We use social multimedia data as well as the corresponding social networks from two popular microblog services, Plurk (www.plurk.com) and Twitter (www.twitter.com). The experiments are conducted on a Microsoft 7 64-bit system with Intel(R) CoreTM 2 Duo E8400 3.00GHz CPU and 4.00GB memory. 5.1. Data preparation We use the data from two microblog social networks to conduct the experiments. The first is the Plurk microblog dataset. We use Plurk API to collect social multimedia data as well as publicly shared messages. We collect messages containing videos from YouTube, because most of videos embedded in messages on Plurk come from YouTube as identified using their URLs. And images in Plurk messages can come from any sources, because there are many popular image hosting services. In addition, we take a snapshot of the social network on June 30, 2012, with nodes representing users and edges referring to their friendships. In total we extract 1,106,622 nodes and 14,099,892 edges. To derive the information diffusion feature, we also crawl the comments and the sharing records of each message. To obtain the explicit multimedia meta information feature, we use YouTube IDs, which can be extracted from the shared URLs, and feed them into YouTube API to collect the features described in Section 4.1. As for the images, we need to first tackle the problem that images from different services could be the same. That said, multiple images from different services might refer to the same one. To deal with such problem, we use AntiDupl.Net (antidupl.narod.ru) to detect duplicate images and count them as one shared image. Since some image-sharing sites (e.g. ptt.cc/cut/, imgur.com, minus.com), which only provide image upload services and do not have explicit meta information, we do not consider the explicit multimedia meta information feature for images. The second data we use for the experiments is the Twitter microblog data collected by W. Galuba et al. [13]. The original goal of using this Twitter dataset is to characterize the propagation of URLs in the Twitter social network. The data is downloaded from http://lsir.epfl.ch/research/datasets/socialnetwork/. We use this data to predict the popularity of shared URLs, because shared URLs can lead users to diverse multimedia contents (e.g. texts, images, music, and videos) on external media services (e.g. Blogspot, Flickr, Last.fm, and YouTube). The dataset contains 15 million URLs exchanged among 2.7 million users over

322

C.-T. Li et al. / Information Sciences 339 (2016) 310–331 Table 2 Statistics of the extracted social multimedia.

Plurk Twitter

Type

No. message

No. user

No. social multimedia

Video Image URL

39,316 213,142 27,126,890

13,252 47,012 19,019,314

17,986 165,078 17,373,857

Table 3 Statistics of the percentages of instances for videos (V) and images (I) in the Plurk dataset and for URLs in the Twitter dataset. Data

Plurk

Task

Re-share task

Label

Yes

No

Never

0

9 4 34

28 12 5

61 84 61

37 15 33

Percentage (V) (%) Percentage (I) (%) Percentage (URL) (%)

Twitter

Popularity score task 1

2

1.4 0.6 3

0.6 0.4 1

Never 61 84 63

a 300 h period (about 12 days). Table 2 summarizes the number of messages/users containing identified videos/images and URLs, and the numbers of unique videos/images and URLs in the Plurk and Twitter datasets. We set the length of the time interval to 1 day in order to have multiple instances of social multimedia and to perform a fine-grained popularity prediction. To predict how popularity evolves, only those social multimedia appearing in consecutive time intervals are considered. If a certain social multimedia m is posted and re-shared in d consecutive time intervals (i.e., d days), d ≥ 3, we construct instances for each time interval. Note that each message containing m is counted as an instance. To predict the evolution of m’s popularity, since previous time intervals from T Rt−n to T Rt−1 are also considered as information diffusion feature described in Section 4.1, we vary t from 2 to |T R| and vary n from 1 to |T R| − 1 in the experiments, where |T R| is the number of time intervals in the dataset. Since we cast the popularity prediction as a classification problem, we need to determine the class labels of popularity. Each instance has two sets of class labels used for re-share and popularity score classifications. For the Re-share task, we consider three class labels, Yes, No, and Never, referring to the next time interval that has at least one non-repeated user, the next time interval having no users, and the following time intervals until T Rd having no users who re-share the social multimedia. For the Popularity Score task, we consider four class labels, 0, 1, 2, and Never, referring to different levels of popularity. The settings for class labels of the Re-Share task, denoted by LabelRS (mt ), and class labels of the Popularity Score task, denoted by LabelRS (mt ), are given in the following formula,



LabelRS (mt ) =

Never, Yes, No,

⎧ ⎪ ⎨Never,

0, LabelPS (mt )= ⎪ ⎩1, 2,

if never being reshared in the feature if |U(mt )| ≥ 1 , if |U(mt )| = 0 if if if if

never being reshared in the future 0.0 ≤ |U (mt )|/maxm < 0.2 , 0.2 ≤ |U (mt )|/maxm < 0.5 0.5 ≤ |U (mt )|/maxm ≤ 1.0

where maxm = MAXm∈M,1≤i≤d {|U (mi )|}. Table 3 shows the statistics on the distributions of instances over class labels. We can see that instances suffer from class imbalance problem, which will be considered when designing the accuracy measure. 5.2. Evaluation setting We employ the idea of progressive evaluation [15] as the basic setting of our experiments since messages containing social multimedia in a microblog might be posted at any moment and shared continuously by users over time. We use Fig. 9, which illustrates a sequence of instances of a social multimedia m spanning several time intervals, to elaborate the procedure of the experiments. The first round of prediction aims to predict the class labels of the unseen instances in time interval T R2 , by using the instances in time interval T R1 as the training data to build the classifier C1 . For the second round, we aim to predict unseen instances in T R3 . The proposed concept drift-based predictor is built by ensembling the classifier C2 (trained with instances in T R2 ) and the previous classifier C1 , with ensemble weights of the two classifiers dynamically determined as each unseen instance in T R3 arrives. In the tth round, instances in T Rt will be predicted by the concept-drift predictor that is ensembled from C1 , C2 , . . . , Ct−1 , with their weights determined by each newly arrived unseen instance in T Rt . We use two performance measures to compare the effectiveness of different methods. The first is accuracy, which is defined as the number of correctly predicted instances over the total number of unseen instances in the time interval T Rt . The accuracy measure will be used for both prediction tasks. The second is F -Score = 2 · Precision · Recall/(Precision + Recall ),

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

323

Fig. 9. An illustration to show the evaluation procedure for progressive prediction for instances of a social multimedia m.

in which precision and recall are designed to take into account the class imbalance problem of instances. Since the ultimate goal of this study is to find the popular social multimedia, we focus on whether or not the instances with Yes labels (i.e., instances with labels 0, 1, and 2 in the popular score prediction task) are successfully detected in the Re-share prediction task. Thus precision and recall are defined as

Precision = Recall =

# o f predicted instances with label Yes in T Rt , #instances with label Yes in T Rt

# o f predicted instances with label Yes in T Rt . #instances in T Rt

We compute the values of average accuracy and average F -Score over all instances in time interval T Rt , and use these values as the prediction performance of T Rt . 5.3. Comparative methods 5.3.1. Majority Predictor (Majority) Considering the dataset has a class imbalance problem, one intuitive baseline method is to classify the class labels of any unseen instances as the label with the maximum number of instances, i.e., Never. This is because the number of instances with the label Never is the highest for videos and images in both tasks. We name this method as Majority Predictor (Majority). Since the overall percentages of Never instances are more than 60%, Majority can be regarded as a strong baseline. Note that in the following, we do not evaluate Majority with F -Score because the values of Precision will be the same for all cases. 5.3.2. Basic Predictor (BP) Another important comparative method, named Basic Predictor (BP), is to study the scenario where concept drift is not considered. To achieve such idea, we train a single classifier for instances of each time interval T Ri (1 ≤ i ≤ t − 1 ) to predict the labels of instances in T Rt . As the time approaches T Rt+1 , the previous classifier is discarded and a new classifier is trained for T Ri (1 ≤ i ≤ t ). Note that the Basic Predictor cannot be used to exploit the potential correlation between each unseen instance and the trained classifier. Therefore, BP also provides us an opportunity to see whether or not the ensemble method in our proposed predictor is useful. 5.3.3. Global Concept Drift Predictor (GCDP) We further devise another competitor, Global Concept Drift Predictor (GCDP). In contrast to the idea of local concept drift, GCDP is built based on an intuitive and common assumption that instances in more recent time intervals should have a higher impact on predicting the labels of unseen instances. To evaluate such assumption, instead of using unseen instances to learn the weights of classifiers trained from T Ri (1 ≤ i ≤ t − 1 ), we allocate higher weights to classifiers trained from instances in recent time intervals and give lower weights to classifiers trained earlier. We design the weighting rule to be

324

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

Fig. 10. Accuracy on (a) video and (b) image of the Plurk dataset for re-share classification.

1 , where i is the index of the ith time interval, in which the instances are used to train classifier Ci , and t weightCi (mt ) = t−i is the index of the current time interval containing unseen instances to be predicted.

5.3.4. Conventional popularity predictors: support vector machine (SVM), J48 decision tree (DT), and support vector regression (SVR) The classification techniques of support vector machine [2,5,44,30], J48 decision tree [2,5,44], and support vector regression [29,40] are proven to be useful in predicting popularity. We consider these three conventional classification-based methods as benchmarks. To apply each of these methods, we train a single classifier on instances from every time interval T Ri (1 ≤ i ≤ t − 1 ) to predict the labels of instances in T Rt . As time interval moves toward T Rt+1 , the previous trained classifier will be discarded and a new classifier trained from T Ri (1 ≤ i ≤ t ) is built. The corresponding experiment results on the Twitter dataset are shown in Figs. 12 and 16 for re-share and popularity score classifications respectively. 5.4. Results on re-share classification The prediction accuracy for video and image instances are shown in Figs. 10 and 12(a). In Fig. 10, we can see that as time interval T Rt moves forward, the proposed method generally outperforms the other three competitors (i.e., Majority, BP, and GCDP), especially on image instances. For video instances, the accuracy of our proposed method is 5% higher than BP on average. The accuracy of GCDP generally falls between BP and the proposed method. Such results prove that the consideration of local concept drift really has great impact on the prediction accuracy. Majority significantly beats others in the first time interval (t = 2 ) due to lack of training instances. The main difference between Fig. 10(a) and (b) lies in that Majority is very effective for image instances but still trails our method. The high accuracy achieved by Majority is due to the fact that the percentage of label Never is very high, 84% in Table 3, so it is easy for Majority to perform well. The experiments on the URL instances from the Twitter dataset demonstrate a similar trend, as shown in Fig. 12(a). In summary, the proposed method outperforms Majority, BP, and GCDP with a 15% improvement on average, except for the first two time intervals. We believe it is because the instances from the first two time intervals might either lack training

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

325

Fig. 11. F-score on (a) video and (b) image of the Plurk dataset for re-share classification.

Fig. 12. (a) Accuracy and (b) F-score of URLs in Twitter dataset for re-share classification.

instances or have less historical information to extract effective features. In addition, the proposed method also outperforms the three conventional classification-based methods (i.e., SVM, DT, and SVR) for both evaluation metrics. Although SVR is very close to our method in terms of Accuracy, it performs worse in terms of F-score, which will be elaborated in the next paragraph. In short, the proposed method consistently achieves the highest accuracy scores over time comparing to baseline and conventional classification based methods. The results of detecting the re-shared instances (i.e., predict the instances with label Yes) is shown in Figs. 11 and 12(b) for Plurk and Twitter datasets respectively. From the results on Plurk videos in Fig. 11(a), it can be observed that the three methods perform similarly when compared to each other. The average F-score for Proposed, GCDP, and BP are 0.325, 0.324, and 0.308 respectively. As for the results on Plurk images in Fig. 11(b), the proposed method solidly beats the others. To find the factor that leads to such a big performance difference between videos and images, we conduct follow-up experiments as explained in the next paragraph. On the other hand, for the results on Twitter URLs, as shown in Fig. 12(b), the proposed method significantly outperforms the other competitors, including the three conventional classification-based methods (i.e., SVM, DT, and SVR). In short, these results exhibit that the proposed method can accurately detect instances that will be re-shared in the near future. Recall that in Section 4, we have pointed out that how a social multimedia is re-shared and spread has higher impact on predicting its popularity compared to the meta information of the social multimedia. Therefore we wonder how the accuracy and the F-score of video instances will change if we do not consider explicit Multimedia meta Information Feature (MIF) and only use the information diffusion feature for prediction. The results are shown in Fig. 13. From Fig. 13(a), we can see that our method is still effective, even 10% better than BP on average. The result is even better than that when considering MIF as shown in Fig. 10(a) (5% better than BP). From Fig. 13(b), the result generated by our method looks much better (F-score values are higher than BP and GCDP in general), when compared to Fig. 11(b). We believe the reason that BP and GCDP have similar performance is due to MIF. To further prove our assumption, we conduct experiments with and

326

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

Fig. 13. (a) Accuracy and (b) F-score without explicit Multimedia meta Info Feature (MIF) on videos of the Plurk dataset for re-share classification.

Fig. 14. Accuracy by methods of (a) Proposed, (b) GCDP, and (c) BP with and without explicit Multimedia meta Info Feature (MIF) on video instances of the Plurk dataset for re-share classification.

without MIF in our method, GCDP, and BP. We aim to see whether or not MIF only affects GCDP and BP, and the results in Fig. 14 give us the answer. Without MIF, our method remains stable while the accuracy decreases for GCDP and BP. 5.5. Results on popularity score classification Figs. 15 and 16 exhibit the accuracy of popularity score classification using Plurk and Twitter datasets respectively. Similar to the results of re-share prediction, the proposed method obviously beats other competitors on the Plurk dataset and outperforms the conventional classification-based methods (i.e., SVM, DT, and SVM) on the Twitter dataset. Our method outperforms BP by 8% on videos, 15% on images, and 12% on URLs on average. In addition, our method outperforms SVR by 12%, SVM by 15%, and DT by 20% on Twitter URLs. The much more stable accuracy curve exhibits the superiority of our method. It is worth mentioning that for the prediction accuracy of videos (Fig. 15(a)) and URLs (Fig. 16), the accuracy of GCDP dips abruptly at several time intervals, i.e., t = 4, 9, 16, 22, 25 for videos and t = 4, 7, 10 for URLs. We believe it is because the sudden drift phenomenon occurs during these time intervals. That said, some videos and URLs are suddenly widely-circulated in Plurk and Twitter, and they become very popular in a short period. Since our method is able to deal with the local concept drift while GCDP cannot, GCDP fails to successfully predict the popularity labels of instances in these time intervals. We do not include the complete experimental results like the re-share classification in this paper, but basically the F-score results and the results on removing MIF are similar to those of the re-share classification. 5.6. Feature and parameter analysis In this section, we aim to investigate how different features and diverse parameters affect the prediction performance when using the proposed method. We first report the accuracy results of the five information diffusion features described in Section 4.1. The results are shown in Table 4. Note that we present the performance of using one single feature and all features combined. We can see for either re-share or popularity score classification, the performance of using propagation capability feature only is consistently better than that of other features. Such results reflect the ways of propagation of social

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

327

Fig. 15. Accuracy on (a) video and (b) image of the Plurk dataset for popularity score classification.

multimedia contents in a microblog social network plays an important role in its popularity. In addition, social authority has the least effect on performance, and we conclude that a user’s authority is not necessarily the most deterministic factor that makes a multimedia content popular. In other words, to generate a popular multimedia item, one should come up with an effective propagation strategy that maximizes the spread in the social network, rather than focusing on reaching the authoritative users. We report the performance evolution with different K values to find the top-K similar test instances with respect to the unseen/targeted instance, as described in Section 4.3, in the proposed concept drift-based popularity predictor. The results are shown in Table 5. It can be observed that when K = 10, our method consistently achieves better results in terms of accuracy and F-score. As the number of similar instances increases (e.g. K = 20), too much noise is included in the training, thus leads to worse performance. On the other hand, using too few classifiers (e.g. K = 1) to build the ensemble predictor results in a failure to capture the drift concepts. Therefore, we suggest finding the top 10 similar instances for better results. We also report the correlation between the prediction performance and the number of previous time intervals included (denoted by #PrevTimeIntv), which determines the volume of information diffusion features being extracted. The results are shown in Table 6. We can observe that in general, the proposed method with parameter #PrevTimeIntv = 3 derives better accuracy and F-score. We think that if more previous time intervals are considered, the prediction model tends to include irrelevant diffusion records into the features; in contrast, fewer previous intervals might not capture the trend of popularity into the trained predictor. Therefore, we suggest using about three previous time intervals to have a satisfactory prediction result. Finally, we have computed the area under the ROC curve (AUC) for re-share classification using the Twitter dataset, by varying the threshold τ that determines the class labels of Yes and No. That says, we set LabelRS (mt ) = Yes, if |U(mt )| ≥ τ ; LabelRS (mt ) = No, if |U(mt )| < τ . The average AUC value of the re-share classification over all the time intervals is 0.892. The result exhibits that the proposed concept drift-based popularity predictor is more capable of ranking a randomly selected Yes instance than a randomly chosen No instance.

328

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

Fig. 16. Accuracy on URLs of Twitter dataset for popularity score classification.

Table 4 The accuracy of each category of features used in the popularity prediction of both tasks using URL instances of the Twitter dataset. “+” indicates the single feature used for prediction while “combined” indicates all the features are used. The bold values refer to the best feature performance in the t-th time interval. Re-share task

The tth time interval 1

2

3

4

5

6

7

8

9

10

11

12

+Social authority +Propagation capability +Text contents +Temporal info +Interaction features Combined

0.599 0.779 0.731 0.676 0.620 0.785

0.583 0.747 0.715 0.704 0.606 0.750

0.567 0.731 0.696 0.679 0.592 0.736

0.577 0.753 0.730 0.695 0.626 0.746

0.575 0.723 0.764 0.746 0.354 0.753

0.575 0.741 0.730 0.687 0.625 0.743

0.571 0.747 0.734 0.687 0.633 0.743

0.580 0.753 0.731 0.695 0.628 0.764

0.576 0.683 0.742 0.670 0.372 0.731

0.603 0.777 0.758 0.720 0.644 0.779

0.599 0.829 0.801 0.750 0.754 0.814

0.631 0.779 0.772 0.741 0.620 0.783

Popularity score task

The tth time interval 1

2

3

4

5

6

7

8

9

10

11

12

+Social authority +Propagation capability +Text contents +Temporal info +Interaction features Combined

0.417 0.772 0.650 0.579 0.608 0.776

0.424 0.766 0.717 0.667 0.618 0.750

0.508 0.730 0.715 0.643 0.612 0.747

0.489 0.743 0.729 0.638 0.643 0.746

0.503 0.756 0.619 0.741 0.617 0.768

0.500 0.761 0.630 0.605 0.650 0.764

0.479 0.756 0.609 0.629 0.662 0.743

0.518 0.753 0.533 0.663 0.651 0.762

0.527 0.747 0.632 0.714 0.612 0.754

0.518 0.774 0.741 0.651 0.684 0.778

0.515 0.842 0.782 0.751 0.800 0.848

0.493 0.813 0.785 0.746 0.981 0.799

Table 5 The accuracy and F-score by varying the K value used in the proposed method for re-share classification, using URL instances of the Twitter dataset. The bold values refer to the best feature performance in the t-th time interval. Accuracy

K K K K K

=1 =5 = 10 = 15 = 20

F-score

K K K K K

=1 =5 = 10 = 15 = 20

The tth time interval 1

2

3

4

5

6

7

8

9

10

11

12

0.656 0.573 0.756 0.675 0.574

0.702 0.682 0.750 0.660 0.621

0.710 0.674 0.747 0.713 0.678

0.743 0.682 0.746 0.700 0.709

0.759 0.748 0.769 0.730 0.768

0.743 0.743 0.764 0.721 0.737

0.750 0.757 0.743 0.721 0.728

0.740 0.774 0.762 0.753 0.749

0.731 0.734 0.754 0.746 0.751

0.736 0.767 0.778 0.762 0.750

0.777 0.811 0.818 0.804 0.798

0.742 0.790 0.799 0.790 0.790

The tth time interval 1

2

3

4

5

6

7

8

9

10

11

12

0.813 0.733 0.837 0.749 0.718

0.831 0.752 0.839 0.739 0.701

0.819 0.748 0.836 0.799 0.759

0.835 0.761 0.842 0.797 0.800

0.853 0.822 0.859 0.822 0.859

0.836 0.821 0.843 0.814 0.823

0.833 0.836 0.830 0.815 0.816

0.823 0.845 0.849 0.835 0.831

0.823 0.823 0.839 0.838 0.842

0.817 0.852 0.856 0.849 0.836

0.865 0.892 0.893 0.888 0.882

0.852 0.883 0.888 0.883 0.883

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

329

Table 6 The accuracy and F-score with a varying number of previous time intervals (#PrevTimeIntv) used to extract the information diffusion features in the proposed method for re-share classification, using URLs instances of the Twitter dataset. The bold values refer to the best feature performance in the t-th time interval. Accuracy

# # # # #

PrevTimeIntv PrevTimeIntv PrevTimeIntv PrevTimeIntv PrevTimeIntv

The tth time interval

= = = = =

1 2 3 4 5

F-score

# # # # #

PrevTimeIntv PrevTimeIntv PrevTimeIntv PrevTimeIntv PrevTimeIntv

1

2

3

4

5

6

7

8

9

10

11

12

0.656 0.690 0.756 0.581 0.573

0.702 0.746 0.750 0.690 0.682

0.710 0.684 0.747 0.722 0.674

0.743 0.741 0.746 0.744 0.682

0.759 0.713 0.768 0.773 0.748

0.743 0.736 0.764 0.759 0.743

0.750 0.767 0.743 0.767 0.757

0.740 0.771 0.762 0.770 0.774

0.731 0.754 0.754 0.742 0.734

0.736 0.775 0.778 0.759 0.767

0.777 0.803 0.818 0.808 0.811

0.742 0.778 0.799 0.789 0.790

The tth time interval

= = = = =

1 2 3 4 5

1

2

3

4

5

6

7

8

9

10

11

12

0.813 0.784 0.837 0.728 0.733

0.831 0.821 0.839 0.816 0.752

0.819 0.746 0.836 0.823 0.748

0.835 0.819 0.842 0.832 0.761

0.853 0.791 0.850 0.850 0.822

0.836 0.813 0.843 0.841 0.821

0.833 0.841 0.830 0.845 0.836

0.823 0.844 0.839 0.844 0.845

0.823 0.840 0.849 0.830 0.823

0.817 0.853 0.856 0.841 0.852

0.865 0.883 0.893 0.888 0.892

0.852 0.875 0.888 0.882 0.883

6. Discussion In this section, we discuss the implications and limitations of the proposed method compared to existing approaches in predicting popularity of social multimedia shared on microblogging platforms. Implications. We learn four implications from this study. First, the future popularity of a social multimedia content can be correlated to previous time intervals whose instances possess similar concept drift. Therefore, the proposed modeling of local concept drift plays an important role in enhancing the performance of popularity prediction, and leads to promising performance and more stable results. Second, while conventional classification-based methods consider the importance of instances in each historical time interval equally for model training, the effect of concept drift on testing instances is completely ignored. Without modeling concept drift, we can see from the experiment results that the prediction performance suffers and is unstable. Third, the propagation capability of users in the social graph affects whether a certain media content will become popular, in addition to conventional features such as textual contents, social graph structures, and time series features. Fourth, while the conventional methods require the features of explicit multimedia meta information to remain relatively stable and accurate, our method can achieve even better results without the explicit multimedia meta information. This indicates that the proposed method can lead to high accuracy even when the explicit multimedia meta information cannot be obtained from the multimedia service providers. Limitations. There are four limitations of the proposed concept drift-based popularity predictor. First, to have better performance, the proposed method needs to have a plentiful volume of historical instances so that the concept drift can be learned. With that said, if there is no or too few instances of previous time intervals, our method may not be highly accurate. Second, even with many past instances, if the popularity evolution tends to be homogeneous, i.e., possessing nearly the same drift of concept, the proposed method essentially becomes a conventional classification-based method. In other words, more diverse behaviors of concept drift can lead to better performance and enable our model to deal with upcoming unknown instances. Third, since the propagation capability scores of users are computed from the spread of social multimedia items in a social graph, our method can only be applied to social networking platforms. For general-purpose popularity time series data, our method might be unsuitable. Fourth, it is apparent that our method needs to specify two parameters, the K most similar instances and the number of previous time intervals to be considered, #PrevTimeIntv. Nevertheless, we can use a validation set to learn the best of such parameters. 7. Conclusion This paper exploits the idea of concept drift to devise a novel popularity predictor for social multimedia. We believe a predictor should consider that popularity may be affected by hidden contexts, even given the same explicit predictive features, the popularity could change over time. With a series of developed features that capture how social multimedia propagates over a social network, as well as the explicit media information, we ensemble multiple classifiers trained from instances in a set of time intervals, and dynamically determine their weights according to the unseen instances. Experiments conducted on the Plurk dataset demonstrate promising results on both re-share and popularity score classifications. We summarize the discussions on future directions in the following three aspects, followed by two ongoing application developments. First, we observe that there is a tradeoff between accurate predictions and early predictions for social multimedia contents. Using more early diffusion data of social multimedia contents can lead to accurate results, while less might degrade the performance. Second, after closely looking into the evolution of popularity, we find that the popularity of some social multimedia contents is very unpredictable. These social multimedia usually have very low view counts at first

330

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

but suddenly burst onto the scene. It would be interesting and useful to detect such multimedia outliers with anomalous popularity values. Third, from a technical perspective, since the current model is trained offline, it would be more effective and adaptive if we can devise an online concept drift-based learning method that is capable of dynamically building the classifiers when new diffusion data is generated. The future research direction lies in two parts. First, we aim at modeling the global distribution of social multimedia popularity (e.g. popularity vs. the number of social multimedia) under the proposed method, comparing to benchmark models such as Poisson, Power-law, and log-normal distributions [28]. Second, we will develop a scalable version of the proposed method so it can handle the diffusion of social multimedia in a real-world large-scale social network. In addition, we would like to point out two of our ongoing applications using popularity prediction of social multimedia. First, we are developing network caching strategies [10] for multimedia contents based on their predicted popularity. A well-devised content caching strategy will not only consume less network bandwidth but also improve the usage of storage and computation resources. Second, to have cost-effective advertisement, advertising strategies can be adaptively and dynamically altered based on the current popularity of multimedia content [11]. We are designing an automated method for advertisers to place an ad on the page of certain multimedia content, which is predicted to become popular soon, such that the eventual revenue can be maximized. Acknowledgments This work was sponsored by Ministry of Science and Technology (MOST) of Taiwan under grant 104-2221-E-001-027MY2. This work is also supported by Multidisciplinary Health Cloud Research Program: Technology Development and Application of Big Health Data, Academia Sinica, Taipei, Taiwan under grant MP10212-0318. References [1] M. Ahemd, S. Spagna, F. Huici, S. Niccolini, A peek into the future: predicting the evolution of popularity in user generated content, in: Proceedings of ACM International Conference on Web Search and Data Mining (WSDM), 2013. [2] I. Arapakis, B.B. Cambazoglu, M. Lalmas, On the feasibility of predicting news popularity at cold start, in: Proceedings of the Sixth International Conference on Social Informatics (SocInfo), 2014, pp. 290–299. [3] P. Bao, H.-W. Shen, J. Huang, X.-Q. Cheng, Popularity prediction in microblogging network: a case study on Sina Weibo, in: Proceedings of ACM International World Wide Web Conference (WWW), 2013. [4] P. Bao, H.-W. Shen, X. Jin, X.-Q. Cheng, Modeling and predicting popularity dynamics of microblogs using self-excited Hawkes processes, in: Proceedings of ACM International World Wide Web Conference (WWW) Companion, 2015. [5] R. Bandari, S. Asur, B.A. Huberman, The pulse of news in social media: forecasting popularity, in: Proceedings of AAAI International Conference on Weblogs and Social Media (ICWSM), 2012. ˙ Handling concept drift: importance, challenges & solutions. Tutorial, in: Proceedings of Pacific-Asia [6] A. Bifet, J. Gama, M. Pechenizkiy, I. Žliobaite, Conference on Knowledge Discovery and Data Mining (PAKDD), 2011. [7] L. Breiman, Random forests, Mach. Learn. 45 (1) (2001) 5–32. [8] M. Cha, H. Kwak, P. Rodriguez, Y.Y. Ahn, S. Moon, Analyzing the video popularity characteristics of large-scale user generated content systems, IEEE/ACM Trans. Netw. 17 (5) (2009) 1357–1370. [9] W. Ding, Y. Shang, L. Guo, X. Hu, R. Yan, T. He, Video popularity prediction by sentiment propagation via implicit network, in: Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), 2015. [10] J. Famaey, T. Wauters, F.D. Turck, On the merits of popularity prediction in multimedia content catching, Integrated Network Management (2011). [11] F. Figueiredo, On the prediction of popularity of trends and hits for used generated videos, in: Proceedings of ACM International Conference on Web Search and Data Mining (WSDM), 2013. [12] F. Figueiredo, F. Benevenuto, J.M. Almeida, The tube over time: characterizing popularity growth of YouTube videos, in: Proceedings of ACM International Conference on Web Search and Data Mining (WSDM), 2011. [13] W. Galuba, K. Aberer, D. Chakraborty, Z. Despotovic, W. Kellerer, Outtweeting the twitters: predicting information cascades in microblogs, in: Proceedings of Third Conference on Online Social Networks (WOSN), 2010. [14] S. Gao, J. Ma, Z. Chen, Modeling and predicting retweeting dynamics on microblogging platforms, in: Proceedings of ACM International Conference on Web Search and Data Mining (WSDM), 2015. [15] M. Harries, K. Horn, Detecting concept drift in financial time series prediction using symbolic machine learning, in: Proceedings of Australian Joint Conference on Artificial Intelligence, 1995. [16] X. He, M. Gao, M.-Y. Kan, Y. Lin, K. Sugiyama, Predicting the popularity of Web 2.0 items based user comments, in: Proceedings of ACM International Conference on Research and Development in Information Retrieval (SIGIR), 2014. [17] C.-T. Ho, C.-T. Li, S.-D. Lin, Modeling and visualizing information propagation in a micro-blogging platform, in: Proceedings of IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM), 2011. [18] L. Hong, O. Dan, B.D. Davison, Predicting Popular Messages in Twitter, in: Proceedings of ACM International World Wide Web Conference (WWW), 2011. [19] J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, A. Bouchachia, A survey on concept drift adaption, ACM Comput. Surv. 46 (4) (2014) Article 44. [20] L. Jiang, Y. Miao, Y. Yang, Z. Lan, A.G. Hauptmann, Viral video style: a closer look at viral videos on YouTube, in: Proceedings of ACM International Conference on Multimedia Retrieval (ICMR), 2014. [21] X. Jin, A. Gallagher, L. Cao, J. Luo, J. Han, The wisdom of social multimedia: using Flickr for prediction and forecast, in: Proceedings of ACM International Conference on Multimedia (MM), 2010. [22] A. Khosla, A.D. Sarma, R. Hamid, What makes an image popular? in: Proceedings of ACM International World Wide Web Conference (WWW), 2014. [23] J.Z. Kolter, M.A. Maloof, Dynamic weighted majority: a new ensemble method for tracking concept drift, in: Proceedings of IEEE International Conference on Data Mining (ICDM), 2003. [24] H. Lakkaraju, J. Ajmera, Attention prediction on social media brand pages, in: Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), 2011. [25] J.G. Lee, S. Moon, K. Salamatian, Modeling and predicting the popularity of online contents with Cox proportional hazard regression model, J. Neurocomput. 76 (1) (2012) 134–145. [26] H. Li, X. Ma, F. Wang, J. Liu, K. Xu, On popularity prediction of videos shared in online social networks, in: Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), 2013.

C.-T. Li et al. / Information Sciences 339 (2016) 310–331

331

[27] K. Lerman, T. Hogg, Using a model of social dynamics to predict popularity of news, in: Proceedings of ACM International World Wide Web Conference, 2010. [28] K. Lerman, R. Ghosh, Information contagion: an empirical study of the spread of news on Digg and Twitter social networks, in: Proceedings of AAAI International Conference on Weblogs and Social Media (ICWSM), 2010. [29] S.K. Maity, A. Gupta, P. Goyal, A. Mukherjee, A stratified learning approach for predicting the popularity of twitter idioms, in: Proceedings of AAAI International Conference on Web and Social Media (ICWSM), 2015. [30] P.J. McParlane, Y. Moshfeghi, J.M. Jose, “Nobody Comes Here Anymore, it’s too crowded”: predicting image popularity on Flickr, in: Proceedings of ACM International Conference on Multimedia Retrieval (ICMR), 2014. [31] M. Naaman, H. Becker, L. Gravano, Hip and trendy: characterizing emerging trends on Twitter, J. Am. Soc. Inf. Sci. Technol. (JASIST) 62 (5) (2011) 902–918. [32] N. Naveed, T. Gottron, J. Kunegis, A.C. Alhadi, Bad news travel fast: a content-based analysis of interestingness on Twitter, in: Proceedings of ACM International Conference on Web Science, 2011. [33] H. Pinto, J.M. Almeida, M.A. Goncalves, Using early view patterns to predict the popularity of YouTube videos, in: Proceedings of ACM International Conference on Web Search and Data Mining (WSDM), 2013. [34] K. Radinsky, K. Svore, S. Dumais, J. Teevan, A. Bocharov, E. Horvitz, Modeling and predicting behavioral dynamics on the Web, in: Proceedings of ACM International World Wide Web Conference (WWW), 2012. [35] W.N. Street, Y. Kim, A Streaming Ensemble Algorithm (SEA) for large-scale classification, in: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2001. [36] G. Szabo, B.A. Huberman, Predicting the popularity of online content, Commun. ACM 53 (8) (2010) 80–88. [37] A. Tsymbal, The Problem of Concept Drift: Definitions and Related Work, Trinity College, 2004 Technical Report. [38] D.R. Wilson, T.R. Martinez, Improved heterogeneous distance functions, J. Artif. Intell. Res. 6 (1) (1997) 1–34. [39] A. Tatar, P. Antoniadis, M.D. de Amorim, S. Fdida, From popularity prediction to ranking online news, Social Netw. Anal. Mining 4 (1) (2014) Article 174. [40] O. Tsur, A. Rappoport, What’s in a hashtag? Content based prediction of the spread of ideas in microblogging communities, in: Proceedings of ACM International Conference on Web Search and Data Mining (WSDM), 2012. [41] A. Tsymbla, M. Pechenizkiy, P. Cunningham, S. Puuronen, Dynamic integration of classifiers for handling concept drift, Int. J. Multi-Sens. Multi-Sour. Inf. Fusion 9 (1) (2008) 56–68. [42] D. Vallet, S. Berkovsky, S. Ardon, A. Mahanti, M.A. Kaafar, Characterizing and predicting viral-and-popular video content, in: Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), 2015. [43] M. Vasconelos, J.M. Almeida, M.A. Goncalves, Predicting the popularity of micro-reviews: a foursquare case study, Inf. Sci. 325 (2015) 335–374. [44] A. Wang, T. Chen, M.-Y. Kan, Re-tweeting from a linguistic perspective, in: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Techniques (NAACL-HLT), 2012. [45] H. Wang, W. Fan, P.S. Yu, J. Han, Mining concept-drifting data streams using ensemble classifiers, in: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2003. [46] L. Xu, J.A. Duan, A. Whinston, Path to purchase: a mutually exciting point process model for online advertising and conversion, Manage. Sci. 60 (6) (2014) 1392–1412. [47] J. Xu, M. V.d. Schaar, J. Liu, H. Li., Forecasting, dia, IEEE J. Sel. Top. Signal Process. 9 (2) (2014) 330–343. [48] J. Yang, S. Counts, Predicting the speed, scale, and range of information diffusion in Twitter, in: Proceedings of AAAI International Conference on Weblogs and Social Media (ICWSM), 2010. [49] J. Yang, J. Leskovec, Patterns of temporal variation in online media, in: Proceedings of ACM International Conference on Web Search and Data Mining (WSDM), 2011. [50] P. Yin, P. Luo, M. Wang, W.-C. Lee, A straw shows which way the wind blows: ranking potentially popular items from early votes, in: Proceedings of ACM International Conference on Web Search and Data Mining (WSDM), 2012. [51] A.H. Zadeh, R. Sharda, Modeling brand post popularity dynamics in online social networks, Decis. Support Syst. 65 (2014) 59–68. [52] Q. Zhao, M.A. Erdogdu, H.Y. He, A. Rajaraman, J. Leskovec, SEISMIC: a self-exciting point process model for predicting Tweet popularity, in: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2015. [53] Y. Zhou, L. Chen, C. Yang, D.M. Chiu, Video popularity dynamics and its implication for replication, IEEE Trans. Multimedia 17 (8) (2015) 1273–1285. ˙ Learning under Concept Drift: An Overview. Technical Report, Vilnius University, 2009. [54] I. Žliobaite,