Predicting emotional reactions to news articles in social networks

Predicting emotional reactions to news articles in social networks

ARTICLE IN PRESS JID: YCSLA [m3+;April 10, 2019;12:11] Available online at www.sciencedirect.com Computer Speech & Language xxx (2019) xxx-xxx www...

2MB Sizes 0 Downloads 72 Views

ARTICLE IN PRESS

JID: YCSLA

[m3+;April 10, 2019;12:11]

Available online at www.sciencedirect.com

Computer Speech & Language xxx (2019) xxx-xxx www.elsevier.com/locate/csl

Q1

XX Predicting emotional reactions to news articles in social networksI

3X XOmar XD Juarez GambinoDa,b 4X X , D5X XHiram CalvoD6X X*,a

Q2 a

b

Centro de Investigaci on en Computaci on, Instituto Polit ecnico Nacional, CIC-IPN, Mexico Escuela Superior de C omputo, Instituto Polit ecnico Nacional, ESCOM-IPN J.D. B atiz e/ M.O. de Mendiz abal s/n, Mexico City 07738, Mexico Received 25 August 2017; received in revised form 13 March 2019; accepted 15 March 2019 Available online xxx

Abstract After reading a news article, some readers post their opinion to social networks, particularly as tweets. These opinions (responses) have an important emotional content. By analyzing users’ responses in context, it is possible to find a set of emotions expressed in these tweets. In this work we propose a method to predict the emotional reactions that Twitter users would have after reading a news article. We consider the prediction of emotions as a classification problem and we follow a supervised approach. For this purpose, we collected a corpus of Spanish news articles and their associated tweet responses. Then, a group of annotators tagged the emotions expressed in them. Twitter users can express more than one emotion in their responses, so that in this work we deal with this characteristic by using a multi-target classification strategy. The use of this strategy allows an instance (a news article) to have more than one associated class (emotions expressed by users). In addition to that, the multi-target strategy permits to predict not only the emotional reactions, but also the intensity of these emotions, considering how often each specific emotion was triggered by users. By measuring the deviation of the predicted emotional reactions with regard to the annotated ones, we obtain an emotional reactions similarity of 89%. Ó 2019 Elsevier Ltd. All rights reserved. 1

1. Introduction

Social media have changed the way people learn about current events. Traditional media like television, radio and newspapers have active accounts in social networks to disseminate their news effectively (and) to a large 4 population (Bandari et al., 2012). Twitter is a notorious case in social networks for publishing news. This network 5 has become media’s favorite for posting news; that is why almost 85% of trending topics are headlines or persistent 6 news (Kwak et al., 2010). 7 Posts in social media can be about anything; there are few restrictions concerning the topic or the way users 8 express. Because of this, users feel free to talk without any concern and express their opinions. Furthermore, the con9 tent of posts usually has a high emotional content (Li et al., 2017) where it is easy to identify positive and negative 10 points of view (sentiment polarity) and even determine specific emotions like happiness or anger (emotional Q3 11 reaction). X X 12 Emotions are an interesting field of study because they are a substantial part of human interaction. Everyday we 13 interpret our own and other people’s emotional reactions. We are also able to predict reactions from antecedent 14 events (Shaver et al., 1987). 2 3

I This research was funded by CONACyT-SNI and Instituto Politecnico Nacional (IPN), particularly through grants SIP20195402, SIP20195886, EDI, and COFAA-SIBE. * Corresponding author. E-mail addresses: [email protected] (O.J. Gambino), [email protected] (H. Calvo).

https://doi.org/10.1016/j.csl.2019.03.004 0885-2308/ 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

JID: YCSLA

2

ARTICLE IN PRESS

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

27

In this work we investigate the emotional reactions of Twitter users to news articles and the intensity of these emotions. We also aim to propose different methods to predict emotional reactions to future news articles. To achieve this, we collected a corpus of news articles and their corresponding annotated Twitter users’ emotional reactions from three Mexican newspapers; determined the features that characterize news articles; and finally, we used Machine Learning methods for predicting the emotional reactions of Twitter users to news articles. Even though linguistic resources like lexicons and POS taggers have proved to be useful in sentiment analysis tasks (Gambino and Calvo, 2016), (Pak and Paroubek, 2010), we want to avoid the use of hand-crafted resources, because they are language dependent and could make difficult to apply our proposal in future works with languages where they are scarce. Related work is presented in Section 2, describing different objectives and methods that other authors have proposed. In Section 3 the general description of our proposal is explained, outlining the considered stages. In Section 4 we describe in detail the strategy we followed to implement multi-target classification. Experiments and results are shown in Section 5; and finally in Section 6 we expose some conclusions and future work.

28

2. Determining emotions from reader’s perspective

29

36

Social media have led to a new interaction between writers and readers. The writer of a post can receive several responses from readers as soon as it is available and then, a direct interchange of opinions starts. This particular form of communication has attracted attention of the Sentiment Analysis community because both writers and readers express emotions during their conversation. There have been several efforts to automatically determine emotions from the readers’ perspective. The following works present important similarities. We are mainly interested in the way all works of this section deal with multi-label data, but the way authors face the problem of multi-label classification varies. Each of the following subsections explains how these works deal with multi-label data and what they were able to predict.

37

2.1. Prediction of predominant emotion

38

44

In Lin et al. (2007), a method to classify news articles based on the emotions they evoke in readers was proposed. The authors collected 17,743 news articles from Yahoo! China and the reactions expressed by its users. Yahoo! China has a functionality that allows users to express their feeling after reading the articles by indicating one or more of the following emotions: happy, angry, sad, surprised, heartwarming, awesome, bored and useful. The authors extracted features like unigrams, bigrams, metadata and also used a lexicon to get emotion categories of words. Then, an SVM was trained with these features using 12,079 articles. The model was tested with 5,664 articles. Their results show an accuracy of 87.9% in predicting the predominant emotion in each article.

45

2.2. Prediction of ranking of emotions

46

Readers can express different emotions to news articles, so that an interesting task is to determine the ranking of emotions. In Lin et al. (2008) the same authors of the previous paper increased the size of the corpus (from the same source) collecting in total 25,975 articles for training and 11,441 for testing. Also they extracted a new feature called affix similarities, which quantifies the number and lengths of common substrings between news articles in the training set and news articles in the test set. To rank emotions, the first step was to use a Support Vector Regression (SVR) algorithm on an emotion to predict its percentage of votes in a news article. Then the emotions were sorted according to the estimated percentages of votes. To evaluate the results, the measure ACC@n was used, which considers a proposed emotion list to be correct if its first n emotions are all the same and in the same order as the first n emotions in the true emotion list. In Fig. 1 their results are shown. The sharp decrease in accuracy as n increases reflects the hardness of the ranking task. The authors concluded that generating a completely correct ranked list is a 8! difficult task, because this is equivalent to classify news articles into one out of ð8nÞ! classes, which gives a total of 40,320 classes.

15 16 17 18 19 20 21 22 23 24 25 26

30 31 32 33 34 35

39 40 41 42 43

47 48 49 50 51 52 53 54 55 56 57

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

ARTICLE IN PRESS

JID: YCSLA

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

3

Fig. 1. Emotion ranking performance, taken from (Lin et al., 2008). 58

2.3. Prediction of set of emotions

59

Considering that readers can express more than one emotion, some authors have defined this task as a multi-label classification problem. The idea of this strategy is that each instance can be associated with more than one class. By following this strategy it is possible to train a model with a set of classes per instance and then predict a set of classes as well. A method for classifying news sentences into multiple emotion categories using a multi-label classifier was presented in Bhowmick (2009). In this work, 1,305 sentences were extracted from the Times of India newspaper archive and then manually annotated. Annotators assigned a value of 1 when an emotion was triggered, and 0 otherwise. The emotion set consisted of four emotions: disgust, fear, happiness and sadness. Features like word frequency, polarity of words, and information from a semantic frame were extracted. The authors performed a multi-label classification with RAkEL algorithm (Tsoumakas and Vlahavas, 2007). They pointed out that, due to the particularity of multilabel classification, special metrics such as Hamming Loss, Partial Match Accuracy and Subset Accuracy, which consider partial match on predicted classes (a further explanation of multi-label metrics is given in Section 5.4), are needed. The best results they reported were 0.112, 0.764 and 0.666 respectively1, when predicting the set of emotions evoked by users. Xu et al. (2013) argue that similar events usually lead to similar reader’s emotions, so they proposed to associate emotions with event/topic words instead of individual words. To this purpose, they selected a weighted variation of Latent Dirichlet Allocation (Blei et al., 2003) to extract topic distribution and create a feature vector. This feature vector was used in a Multi-label k-nearest Neighbor classifier (Zhang and Zhou, 2005). To test their model, the authors extracted 8,802 news articles from the Sina website2. This website allows users to select one or more emotions to indicate how they feel after reading an article. Eight emotions are provided: touched, empathy, boredom, anger, amusement, sadness, surprise, and warmness. Their model obtained 0.1323 using Hamming Loss metric. Zhang et al. (2015b) followed a similar strategy as the previous one, by collecting online news from the same source and then a multi-label supervised emotion-topic model was used to predict the eight emotions. Even though authors argue that this is the first work focused on modeling multi-label emotion tagging for online news from reader’s perspectives, the real novelty is the use of a supervised topic model based on LDA. The best results they reported using the Hamming Loss metric were between 0.1572 and 0.01. Another work that uses a similar news source and strategy was presented in Rao (2016). The main contribution of this research was considering context during the topic modeling phase. The proposed model can distinguish topics from a background theme, which are context-independent, to topics from contextual theme, which are contextdependent. The authors determine context by calculating the probability of using certain topics when generating

60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88

1 2

For Hamming Loss, closer to 0 is better; for Partial Match Accuracy and Subset Accuracy, closer to 1 is better. http://news.sina.com.cn/society/.

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

JID: YCSLA

4

ARTICLE IN PRESS

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx Table 1 Sample of reactions of users to a news article. Responses

Emotions Love

R1 R2 R3 R4 Total

0

Joy @ @ @ 3

Surprise

0

Anger @

Sadness @

Fear @

@ 2

@ @ 3

@ 2

96

words in the documents. The authors report results using different metrics, but for our proposal, the most interesting one is the averaged Pearson’s correlation coefficient (AP). This coefficient considers emotional distributions by measuring the correlation between the predicted probabilities and the annotated votes over all emotion labels. With this measure the best result was 0.523. In a more recent work, Li et al. (2017) created an opinion network in which nodes indicate social opinions and edges indicate relation between them. To create the network, the authors trained word vectors according to the most recent Wikipedia word corpus. Then they calculated semantic distance between news via word vectors. Prediction is made by neighbor analysis. The best result was 0.62 with the AP measure mentioned above.

97

3. Proposal

98

After reviewing the state of the art, we found that prediction of readers’ emotions has been studied and promising results have been obtained. Researchers have pointed out that readers can express not only an emotion, but a set of them. To deal with multiple emotions, researchers have proposed different strategies like predicting the most frequent emotion, predicting the ranking of emotions, predicting a set of emotions treating the task as a multi-label classification problem, and predicting the distribution of emotions using topic modeling. Despite the advances in this task, we have identified that no attempts to predict the intensity of the emotional reactions of readers to a news article have been made. This prediction would provide important information about the users sentiment, and to our knowledge, there is no previous work dealing specifically with this subject. To illustrate this idea, let us consider the following news articles and their corresponding Twitter users’ responses. News article headline published by La Jornada (@lajornadaonline, originally in Spanish): “Guillermo Padres4 is given a formal prison sentence.” A sample of its corresponding Twitter users’ responses (translated from Spanish) follows (with added explanations in parentheses):

89 90 91 92 93 94 95

99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121

 R1: May this serve as an example for others, but I ask @EPN (former president) where can his “anti-corruption” plan be seen? when the thief of Deschamps (senator with opulent and lavish lifestyle) will be processed?  R2: Well done! Now they must auction off the goods and return them to the people! 5  R3: OK... (justice) (applauses) and the goods, the money, those who were benefited, and all that was stolen, when and how will it be returned to the nation? .  R4: The PRIAN (former ruling party) will take him out of jail again after a while... We propose that responses can be analyzed by a group of human annotators, and label the emotions they identified from a defined set of six emotions (i.e., Love, Joy, Surprise, Anger, Sadness, Fear). In Table 1 we show an example with some tagged emotions indicated with a checkmark. The last row of the table shows the count of tagged emotions. It is important to mention that emotional reactions to news articles could have contrasting emotions. For instance, the emotion Joy and the emotion Sadness were tagged despite being opposite. This situation can be explained considering that a news article has several responses by different users, and these users could express 3

This value ranges from -1 to 1, where 1 indicates a perfect positive correlation. Padres was wanted by the Federal Police and Interpol on multiple charges of corruption, embezzlement, and extortion. In total, he transferred $8.9 million USD from Mexico to US bank accounts. After a long man-hunt, Padres gave himself up to the Federal authorities claiming innocence. He remains in prison. 5 Twitter allows to include special characters like emojis. The treatment of these characters is explained in Section 5.1. 4

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

JID: YCSLA

ARTICLE IN PRESS

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

5

Fig. 2. Example of emotional reactions.

135

contrasting emotions like in responses R2 and R4, and even a single tweet could present this case, like R3. There could be also tweets in which the emotions are contrasting due to the use of sarcasm, but this interesting and hard to solve problem is out of the scope of this work. Our proposal considers normalizing the count values by dividing them by the number of responses. In Fig. 2 we show a chart with the normalized counting values expressed as percentages. By analyzing this chart we can not only identify the most tagged emotion (Joy and Sadness)—cf. (Lin et al., 2007), the ranking of emotions (Joy, Sadness, Anger, Fear, Love, Surprise)—cf. (Lin et al., 2008), the set of tagged emotions (Joy, Sadness, Anger, Fear)— cf. (Bhowmick, 2009) but also the intensity of each emotion. The higher the percentage of responses that express a specific emotion, the higher the intensity of that emotion in the emotional reactions to the news article. Despite numerical values can give a good idea of the intensity of an emotion, in our proposal we consider that discretization and the use of emotion intensity tags (e.g., None, Very Low, Low, High, and Very High) can facilitate the interpretation of the emotional reactions. A discretization process and the use of tags have been useful to create other resources, for instance Ramırez et al. (2019) assigned 4 intensity values for personality identification traits switching from numerical values to tags. Further details of the discretization process are presented in Section 4.

136

3.1. Gathering reactions to newspaper articles

137

144

Some of the emotions users express as reactions to news articles described in Section 2 are not clearly related to an affective attachment- for example “useful” in Lin et al. (2007). Platforms like Yahoo! or Sina, which were used by Lin et al. (2007) and Xu et al. (2013) to obtain their datasets, provide a set of alleged emotions but no psychological theory supports them. Therefore, we have chosen to build our own resource by gathering reactions to news tweets from three different Mexican newspapers and tagging emotions following a psychological theory of emotion (see Section 3.2). News and responses were collected using the algorithm described in Calvo and Gambino (2017). Due to the nature of the streaming process, it is not possible to apply explicit filters to select specific articles or newspaper sections, in such a way that no categorization of the collected news articles was done.

145

3.2. Theory of emotion

146

There is no unified theory of emotions that are common to all human beings. Therefore, we have to select the one that we consider the most suitable for our work. The emotions proposed by Ekman et al. (1987) are amongst the most used in Emotion Analysis, but we have opted for the ones studied by Shaver et al. (1987). In their proposal, Shaver et al. developed an experiment in which 112 Psychology students were asked to rate 213 words used to express emotions. By applying mean prototypicality rating, 135 words were selected as good emotions descriptors.

122 123 124 125 126 127 128 129 130 131 132 133 134

138 139 140 141 142 143

147 148 149 150

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

ARTICLE IN PRESS

JID: YCSLA

6

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

Table 2 Hierarchy of emotions (part 1). Adapted from Shaver et al. (1987). Group

Primary emotions

Secondary emotions

Tertiary emotions

1

Amor (Love)

Afecto (Affection)

2

Alegraa (Joy)

Deseo (Lust) Alegrıa (Joy)

Adoracion (Adoration), Amor (Love), Carie no (Fondness), Aficion (Liking), Atraccion (Attraction), Ternura (Tenderness), Compasion (Compassion), Sentimentalismo (Sentimentality) Deseo (Lust), Pasion (Passion), Encaprichamiento (Infatuation) Diversion (Amusement), Dicha (Bliss), Alegrıa (Joy), Regocijo (Glee), Jovialidad (Joviality), Deleite (Delight),  Felicidad (Happiness), Jubilo (Jubilation), Euforia (Elation), Satisfaccion (Satisfaction), Extasis (Ecstasy) Entusiasmo (Enthusiasm), Fervor (Zeal), Emocion (Thrill) Contentamiento (Contentment), Placer (Pleasure) Orgullo (Pride), Victoria (Triumph) Esperanza (Hope), Optimismo (Optimism) Encanto (Enthrallment) Alivio (Relief) Asombro (Amazement), Sorpresa (Surprise), Estupor (Astonishment) Irritacion (Irritation), Agitacion (Agitation), Molestia (Annoyance), Mal humor (Grouchiness) Exasperacion (Exasperation) Enojo (Anger), Rabia (Rage), Furia (Fury), Ira (Wrath), Hostilidad (Hostility), Agresividad (Ferocity), Rencor (Bitterness), Odio (Hate), Aversion (Loathing), Desprecio (Scorn), Desagrado (Dislike), Resentimiento (Resentment) Disgusto (Disgust) Envidia (Envy), Celos (Jealousy) Tormento (Torment)

3 4

Sorpresa (Surprise) Enojo (Anger)

Entusiasmo (Enthusiasm) Contentamiento (Contentment) Orgullo (Pride) Optimismo (Optimism) Encanto (Enthrallment) Alivio (Relief) Sorpresa (Surprise) Irritacion (Irritation) Exasperacion (Exasperation) Rabia (Rage)

Disgusto (Disgust) Envidia (Envy) Tormento (Torment)

168

After that, another 100 Psychology students were asked to do a similarity sorting over the 135 words, resulting in a hierarchy with six main groups and every group having 2 internal groups. As a result, the authors proposed a 3 level hierarchy, with the first level containing basic emotions, while the second and third levels, represented more specific emotions. The hierarchy model was especially useful when the annotators were asked to select the emotions expressed by Twitter users in their responses: according to their experience, it was much easier to choose from an extensive list of emotions, because a general emotion like sadness may not express in an accurate way an identified emotion like shame. Another advantage of this approach is that the hierarchy allows us to work not only in a coarse-grained level by working with the primary (basic) emotions, but it is also possible to work in fine-grained levels, if we consider the secondary or even the tertiary levels. In Tables 2 and 3 we show the hierarchy of emotions translated to Spanish from its original English version. Resources related to emotions are scarce in Spanish. Despite the translation of words related to sentiments does not create a highly accuracy resource due to the subtle meaning of words (PerezRosas et al., 2012), some authors have followed this approach (Rangel et al., 2014; Balahur and Perea-Ortega, 2013) arguing that the created resources are useful and good enough. In this work we follow this latter approach, leaving the study of how this translation affects the hierarchy to be explored in future work. Finally, it is important to mention that sometimes emotions previously defined in one level can appear in other level. This is because when test subjects of the Shaver’s experiment grouped the emotions, they were allowed to repeat them in different levels.

169

3.3. Annotation process

170

Studies in semantic networks (Richens, 1958; Figueroa et al., 1976) suggest that humans are consistent describing concrete and abstract concepts (Perez-Corona et al., 2012). A similar consistency has been observed in emotion identification (Ekman et al., 1987). Based on these ideas, humans with no high expertise in emotion theory could in principle be asked to identify the emotions Twitter users express in their responses by using context and common knowledge. Therefore, we have performed an annotation process in which humans identified emotions in responses, and associated them with the groups of provided emotions. It is important to mention that during the annotation process, annotators were not aware of any hierarchy in emotions, so they just had a list of emotions for each group, that is, emotions of primary, secondary and tertiary level combined6.

151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167

171 172 173 174 175 176 177

6

Even though annotators do not indicate the hierarchy of emotions, it is possible to identify the level where they belong in most cases.

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

ARTICLE IN PRESS

JID: YCSLA

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

7

Table 3 Hierarchy of emotions (part 2). Adapted from Shaver et al. (1987). Group

Primary emotions

Secondary emotions

Tertiary emotions

5

Tristeza (Sadness)

Sufrimiento (Suffering) Tristeza (Sadness)

Agonıa (Agony), Sufrimiento (Suffering), Dolor (Hurt), Angustia (Anguish) Depresion (Depression), Desesperacion (Despair), Desesperanza (Hopelessness), Pesadumbre (Gloom), Abatimiento (Glumness), Tristeza (Sadness), Infelicidad (Unhappiness), Afliccion (Grief), Pesar (Sorrow), Miseria (Misery), Melancolıa (Melancholy) Consternacion (Dismay), Decepcion (Disappointment) Culpa (Guilt), Pena (Shame), Arrepentimiento (Regret), Remordimiento (Remorse) Aislamiento (Alienation), Desamparo (Neglect), Soledad (Loneliness), Rechazo (Rejection), Nostalgia (Homesickness), Derrota (Defeat), Inseguridad (Insecurity), Verg€uenza (Embarrassment), Humillacion (Humiliation), Insulto (Insult) Lastima (Sympathy) Sobresalto (Alarm), Conmocion (Shock), Miedo (Fear), Temor (Fright), Horror (Horror), Terror (Terror), Panico (Panic), Histeria (Hysteria), Mortificacion (Mortification) Ansiedad (Anxiety), Nerviosismo (Nervousness), Tension (Tenseness), Inquietud (Uneasiness), Aprension (Apprehension), Preocupacion (Worry)

Decepcion (Disappoinment) Pena (Shame) Desamparo (Neglect)

6

Miedo (Fear)

Lastima (Sympathy) Horror (Horror) Nerviosismo (Nervousness)

178 179 180 181 182 183 184

To annotate the collection of news articles and Twitter users’ responses, we selected four undergraduate Computer Science students. They were asked to follow the next procedure: 1. 2. 3. 4. 5.

Read a news article. Read each Twitter users’ response to the article. Identify the emotions expressed by users in each response in correspondence with the article. Mark all emotions identified in responses according to the provided list of emotions. Continue until finishing all newspaper articles.

198

It was emphasized to the annotators that the emotions to be annotated are those expressed in the responses, and not those that the news article provoked in them. No codebook of trigger words was provided, so the tagging process only relies on their context interpretation and common knowledge. All annotators received two weeks of training in which they became familiar with the task and the procedure was practiced. After that, all four annotators were given 300 news articles (100 per newspaper), 3,542 Twitter users’ responses (an average of 11 per article) and the list of emotions mentioned above. The task was performed during a three month period. In Fig. 3 we show the distribution of emotions tagged by annotators in response tweets of all three newspapers. This distribution was calculated by adding up each tag selected by each annotator in each tweet. Considering that there are 4 annotators and 3D7X X542 tweets, each emotion could be tagged at most 4 £ 3D8X X542 = 14,168 times. From the table we can conclude that Disgust was the most tagged emotion because it was selected almost 6D9X X000 times, while Infatuation was the less selected. The graph also shows a high tendency for annotators to select negative emotions. This trend could be due to the current socio-political situation of the country, and social networks entice users to express this general discontent. In Figs. 4 and 5 we show the distribution by grouping emotions of the secondary level and primary level respectively. The trend of negative emotions is also reflected in these graphs.

199

3.4. Generalization to primary emotions

200

Due to the number of available emotions that annotators can use to tag tweets, we obtained a fine-grained annotation. Despite the valuable information this annotation provides, the sparsity in tags is high. This situation, along with the size of the corpus, make difficult for Machine Learning methods to learn correctly and generate accurate predictions. We believe that primary emotions described in Shaver’s hierarchy provide a good idea of the general sentiment, so we decided to use the hierarchy structure of emotions to generalize each selected emotion to its corresponding primary emotion. Generalization allows to reduce the original set of emotions to only six and therefore sparsity is also reduced, additionally some errors and disagreements during the annotation process can be smoothed down. In Table 4 we show an example of the fine-grained annotation made by one of the annotators to the tweet D10X X“It is good that Germany helps refugees!! They are escaping from a terrible situation in their countries” with the selected emotions indicated by a checkmark to the right of them. The last row of the table shows the resulting coarse-grained

185 186 187 188 189 190 191 192 193 194 195 196 197

201 202 203 204 205 206 207 208 209 210

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

JID: YCSLA

8

ARTICLE IN PRESS

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

Fig. 3. Distribution of emotions tagged by annotators.

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

JID: YCSLA

ARTICLE IN PRESS

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

9

Fig. 4. Distribution of tagged emotions grouped by secondary level.

Fig. 5. Distribution of tagged emotions grouped by primary level.

212

annotation after generalizing the selected emotions to their primary emotions (marked in bold). We apply this generalization process to all the annotations, so for the rest of the work only the primary emotions are considered7.

213

3.5. Tag selection

214

To perform the annotation task described above, knowledge of a specific field is not required, given that the confidence of annotations rests on common knowledge. Agreement or disagreement between annotators is a good indicator of how reliable it is to use the selected emotions to train a Machine Learning algorithm.

211

215 216

7

We consider to use emotions of secondary and tertiary level in future works.

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

ARTICLE IN PRESS

JID: YCSLA

10

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx Table 4 Annotation example. Fine-grained emotions Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Adoration Affect Attraction Compassion Fondness Infatuation Liking Love Lust Passion Sentimentality Tenderness

Amusement Bliss Contentment Delight Ecstasy Elation Enthrallment Enthusiasm Glee Happiness Hope Joviality Joy Jubilation Optimism Pleasure Pride Relief Satisfaction Thrill Triumph Zeal

Amazement Astonishment Surprise

Agitation Anger Annoyance Bitterness Disgust Dislike Envy Exasperation Ferocity Fury Grouchiness Hate Hostility Irritation Jealousy Loathing Rage Resentment Scorn Torment Wrath

Agony Alienation Anguish Defeat Depression Despair Disappointment Dismay Embarrassment Gloom Glumness Grief Guilt Homesickness Hopelessness Humiliation Hurt Insecurity Insult Loneliness Melancholy Misery Neglect Regret Rejection Remorse Sadness Shame Sorrow Suffering Sympathy Unhappiness

Alarm Anxiety Apprehension Fear Fright Horror Hysteria Mortification Nervousness Panic Shock Tenseness Terror Uneasiness Worry

@

Coarse-grained emotions Love @

217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234

Joy

@ @ @

@

@

Surprise

Anger

@

@

@

Sadness

@ @ @

@

Fear

There are some metrics to evaluate inter-annotator agreement like Cohen’s kappa (Cohen, 1960), p index (Scott, 1955) and S (Bennett et al., 1954). All these metrics evaluate agreement between two annotators, but we have four of them. We have selected a generalized version of Cohen’s kappa (multi-kappa) presented in Davies and Fleiss (1982) that can work with more than two annotators. This generalization calculates the expected agreement based on individual annotator marginals. The multi-kappa metric was applied to the annotated emotions in the 3D1X X542 Twitter users’ responses and the result was 0.48 when grouping emotions in the primary level8 (six emotions). Considering that Landis and Koch (1977) establish that a value between 0.4 and 0.6 indicates a moderate inter-annotator agreement, we considered that this value is enough for our purpose. Once we know that the quality of the annotation is good, the next step is to select the tags that reflect common knowledge, in order to provide valuable examples to the Machine Learning stage. To start with, we have four annotations made by four different annotators. Due to the fact that the calculated multi-kappa value is not 1, there is not an unanimous set of tags in each response, hence a procedure to select them is required. The advantage of having four annotators is that we can leave out the one that generates more disagreement. So we calculated again the multi-kappa metric considering three different annotators each time and selected the ones that maximized the agreement value. The best agreement value was 0.49 which, although it is not significant increase, it allows to select one tag out of three based on majority voting. In Table 5 we show an example of the tag selection process for the same tweet mentioned above “It is good that Germany helps refugees!! They are escaping from a terrible situation in their countries”. In this case, annotators identified different emotions, but the majority tagged Love, Joy and Sadness. It 8

A fine-grained inter-annotator agreement was also performed in tertiary level and the result was 0.20, it could be perceived low, so further experiments are left for future work.

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

JID: YCSLA

ARTICLE IN PRESS

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

11

Table 5 Tag selection in an annotated Twitter users’ response. Annotators

Annotator1 Annotator2 Annotator3 Selected set

Emotions Love @ @ @

Joy @ @ @ @

Surprise @

Anger

Sadness

@

@ @ @

Fear

Table 6 Final corpus. Newspaper

News articles

Responses

El Universal Excelsior La Jornada Total

90 100 98 288

1,000 1,136 1,406 3,542

240

is interesting how two initially opposite emotions like Joy and Sadness can be expressed in the same tweet, but this is because the tweet conveys two different contents: help to refugees and terrible situation, that lead to this annotation. This selection procedure was applied to each annotated Twitter users’ response. A total of 300 news articles were collected, from which 12 were removed because they were posts of a previously posted article. Each article had an average of 5 responses from different Twitter users. Table 6 shows the final number of news articles and Twitter users’ responses per newspaper. More details on this resource can be found in Calvo and Gambino (2017).

241

4. Multi-target strategy

242

260

Our proposal considers the prediction of emotional reactions as a supervised classification problem. Therefore, the first thing we need is to get examples of news articles and their corresponding emotional reactions. The emotional reactions can be obtained from the previous selected sets of emotions as we briefly explained in the previous section and shown in Table 1. Emotional reactions can be represented as a tuple, in which each value is the percentage of responses in which a specific emotion was identified. For instance, in Fig. 2 the emotional reactions’ tuple is (0%, 75%, 0%, 50%, 75%, 50%) corresponding to emotions Love, Joy, Surprise, Anger, Sadness and Fear. These values represent the intensity of each emotion. Sentiment analysis tasks, such as sentiment polarity, use tags to represent the expected values (e.g., Positive, Negative or Neutral) making easy to interpret the results. There are other tasks where the expected values can be switched from numerical values to tags in the interest of clarity (Ramırez et al., 2019). We consider that even though specific numeric values in the tuples of emotional reactions provide valuable information, tags can provide a more general and descriptive idea of the intensity of emotions expressed in those tuples. Therefore, we propose a discretization process and representation of the tuple values using intensity meaning tags. In Table 7 we show the interval values defined for each value in the scale and the corresponding tag used to identify each discretized value. If we apply the intervals to the tuple of emotional reactions (0%, 75%, 0%, 50%, 75%, 50%)9, we obtain the new tuple (None, High, None, Low, High, Low). There is some loss of precision when the distribution values are discretized (for example 40% and 50% share the same tag Low), but a useful general idea of the intensity can be obtained from these values. Our model will be trained with these tuples and it will predict a tuple as well. The tags used to identify the values in tuples can be easily handled during the training process of the Machine Learning algorithm.

261

4.1. Multi-target representation of the corpus

262

In a multi-target dataset each instance is associated with a set of classes and each class can take values from a predefined set of values. Thus, a news article has six different associated emotions—the aforementioned set {Love, Joy,

235 236 237 238 239

243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259

263

9

The percentage values can be treated as fractional values by dividing each value by 100, hence the interval can be applied.

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

ARTICLE IN PRESS

JID: YCSLA

12

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx Table 7 Interval values with their corresponding tag. Intensity

Interval

Tag

None Very low D1X X Low High Very high D2X X

val = 0 0 < val  0.25 0.25 < val  0.5 0.5 < val  0.75 0.75 < val  1

N VL L H VH

268

Surprise, Anger Sadness, Fear}. Each emotion can be associated with five different intensity values from the set {None, Very Low, Low, High, Very High}. From this representation it is easy to see that the data of the corpus are multi-dimensional. The possible combinations that an instance can have of emotions and intensity values are 56, which generate 15,625 different tuples of emotional reactions. Taking into account the number of possible tuples, a multi-target classification problem is harder than problems of binary, multi-class and even multi-label classification.

269

4.2. Information in multi-target data

270

290

We have selected some examples of news and, using the previously defined method, we have obtained the instances shown in Table 8. The order of emotions in tuples is (Love, Joy, Surprise, Anger, Sadness, Fear) and the discretized intensity values of each element in the tuple are the ones previously defined in Table 7. By analyzing Table 8, interesting information about Twitter users’ emotional reactions can be obtained. For instance, the first news article of the table provoked a high negative reaction. This can be seen in its corresponding graphic of emotion reactions where Sadness, Anger and Fear were marked as Very High or High, while Low intensity was indicated for Love and Joy. News article a2 also provoked a negative reaction but to a lesser extent. In news article a3 we can see a mixed reaction because the positive emotion Joy and the negative emotion Anger share the same intensity value High. Finally, news articles a4 and a5 provoked positive reactions with high intensity values for some positive emotions, and low intensity values in negatives. From the emotional reactions’ tuple we can also determine the predominant emotions in each news article. In a1 they were Sadness and Anger, while in a5 it was Joy (see Section 2.1). By ordering the emotion intensity values associated to each news article, a ranking of emotions can be generated (see Section 2.2). The set of emotions expressed as a reaction can also be determined by listing all the emotions that have an intensity value different from None, because None represents an intensity value of 0%. For instance, in news article a5 the set of emotions was {Love, Joy, Anger, Sadness}, while Surprise and Fear were not expressed in responses (see Section 2.3). Finally, the most relevant information that can be obtained according to our goal is the intensity in the emotional reactions. This information is represented by the intensity values {None, Very Low, Low, High, Very High} associated with each emotion on news articles. This characteristic makes the data of the corpus to be multi-target. Multitarget data allows to achieve the objectives proposed by different authors in Section 2, but additionally it is possible to predict the intensity of emotional reactions, for which, to our knowledge, there have not been previous attempts.

291

4.3. Class distribution

292

It is important to analyze how the set of emotions is distributed among the set of possible values in the corpus. Figs. 68 show the class distribution for each newspaper respectively, and finally Fig. 9 shows the class distribution of all articles10. The graphics reflect a high imbalance in classes. An example of this imbalance is shown in Fig. 6 for the emotion Love in news articles published by El Universal. In this emotion we can see that 91% of responses had intensity value None, and the remaining 9% is shared between the other four intensity values. Something similar happens with emotion Surprise, and this imbalance is also reflected in the other two newspapers. The problem with this imbalance is that there are very few examples of these emotions in other values, hence it will be harder for the

264 265 266 267

271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289

293 294 295 296 297 298

10

Distribution values between 1% and 3% are shown in bars but without their corresponding numbers to avoid text overlapping.

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

ARTICLE IN PRESS

JID: YCSLA

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

13

Table 8 Example of instances for multi-target classification tagged by humans (N=None, VL=Very Low, L=Low, H=High, VH=Very High).

299 300 301 302 303

News article

Headline

Tuples of emotional reactions

a1

Construction of the wall, unacceptable measure.

(N, VL, N, VH, VH, H)

a2

Homophobic, Sexist, Anti-Environmentalists Seek Trump Cabinet.

(VL, VL, N, H, H, L)

a3

Guillermo Padres jailed after formal arrest.

(N, H, N, L, H, L)

a4

California Seeks Independence After Trump’s Triumph.

(L, VH, VL, VL, VL, N)

a5

Ha*Ash receives four platinum certification.

(H, VH, N, VL, VL, N)

Graphic of emotional reactions

Machine Learning algorithm to learn from these “rare” cases and predict them. The effect of this imbalance is reflected in results shown in Section 5. Another characteristic of the corpus shown in these graphics is the polarity in emotions. Most of the responses have high intensity values for negative emotions like Sadness and Anger while positive emotions like Love and Joy have low intensity values.

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

ARTICLE IN PRESS

JID: YCSLA

14

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

Fig. 6. Emotions distribution in El Universal. 304

5. Experiments and results

305

In this section we explain the experiments performed to train and use a model to predict the emotional reactions of Twitter users’ responses to news articles; we describe how the corpus was processed; the multi-target methods applied to it; and the classifiers used to create a model. Additionally, the metric used to evaluate the model and the obtained results are presented along with a discussion about them.

306 307 308

Fig. 7. Emotions distribution in Excelsior.

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

JID: YCSLA

ARTICLE IN PRESS O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

[m3+;April 10, 2019;12:11]

15

Fig. 8. Emotions distribution in La Jornada. 309

5.1. Preprocessing

310

The raw text collected from electronic versions of the newspapers requires a preprocessing procedure in order to extract useful features to train the model. The first task of preprocessing is to split the text into tokens, this procedure is known as tokenization. During tokenization we can use the space character to separate tokens, but it is important to consider characters like periods, commas, semicolons, etc. that can also be used to separate words. We implemented a script in Python that tokenizes text of news articles. As was shown in the example of Section 3, Twitter

311 312 313 314

Fig. 9. Emotions distribution in all articles.

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

JID: YCSLA

16

ARTICLE IN PRESS

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

329

user’ responses can include special characters called emojis, which are images used to express concrete objects but also ideas and emotions. Due to the fact that new emojis are created constantly, it it is hard to have an up-to-date dictionary to translate the images to the words that define the object, idea or emotion expressed by them. Therefore, we decided to consider emojis as regular words and no special treatment was applied. As can be seen in the following section, words are represented in a vector space model, thus the emojis will also be part of it. The sparsity of tokens can affect the performance of the classifier. We counted the number of tokens (total number of words) and types (total number of distinct words) in the 288 news articles resulting in 107,428 tokens and 17,508 types. So that, if we use words as features for the classifier, we would have 17,508 different features. A way to reduce the sparsity is applying a procedure called lemmatization. This process reduces a word to its lemma. All the inflections of words are grouped together in order to have a unique item. We use the Spanish lemmatizer provided by the language analysis suite Freeling11 (Padr o and Stanilovsky, 2012). After using the lemmatizer, we recalculated the number of tokens and types and we obtained 118,282 tokens and 8D12X X245 types. The number of tokens increased because Freeling separates some contractions of Spanish words like “del” to “de el” before applying the lemmatizer. Despite the increase in tokens, the number of types was reduced to 47%, which is a significant reduction in sparsity.

330

5.2. Feature extraction

331

A classifier requires a good set of features to represent instances. Considering those used by authors in Section 2, we decided to try with two different feature spaces: bag of words and word embeddings.

315 316 317 318 319 320 321 322 323 324 325 326 327 328

332

333 334 335 336 337 338

339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359

5.2.1. Bag of words Words in this model are represented as a vector. The vector has the size of the number of different words in the text (types), and the values are the frequencies of these words. There is evidence that using presence of a word instead of its frequency gets better results in Sentiment Analysis task (Pang et al., 2008). Therefore, we have decided to use a binarized bag of words. The size of the generated binary vectors in our experiments was approximately 4,000 for each newspaper and 8D13X X000 considering the articles of the three newspapers. 5.2.2. Word embeddings The semantic relation between words is not considered by the previous encoding. An approximation to handle semantic relation is to use the context where the words appear in text. By context we mean the group of words that surround a specific word (on the left and right side). Word embedding techniques like Word2Vec (Mikolov et al., 2013) and GloVe (Pennington et al., 2014) capture semantics by using co-occurrence information, i.e., how frequently words appear together in large text corpora. These techniques generate a vector of fixed length with numerical values that represents a word. The generated vectors can be used as features for Machine Learning task (Zhang et al., 2015a), (Levy et al., 2015). Since we are working with news articles and these can be considered documents, we have decided to use Doc2Vec (Le and Mikolov, 2014) to generate vectors that represent each article. Following the idea of training the model with large corpora, we used the Spanish version of Wikipedia for this purpose. We downloaded the Wikipedia dump of December 2016 that is 2.7 GB in size and has 5,233,349 articles, 457,260,830 tokens and 8,445,681 types. Some parameters need to be adjusted in Doc2Vec and, after experimenting with different parameters, we chose the following values: window size = 20; vector length= 200; iterations = 45; model = distributed bag of words; optimization technique = Negative Sampling. A detailed explanation of each parameter can be found in Rong (2014). Once the Doc2Vec model is trained, a method called infer can be used to generate a vector from unseen documents. During the training process of the classifier, we infer the vector corresponding to each news article and the values of the vector are used as features. The same inference process is applied during testing, before the classifier model performs the prediction. Both feature spaces (binary and word embedding) along with their corresponding emotional reactions’ tuple were used to train the classifier. 11

Freely available at http://nlp.lsi.upc.edu/freeling/

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

JID: YCSLA

ARTICLE IN PRESS O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

[m3+;April 10, 2019;12:11]

17

Table 9 BR transformation adapted to multi-target dataset. Classifier

Positive examples

Negative examples

LoveVeryHigh LoveHigh LoveLow LoveVeryLow LoveNone JoyVeryHigh JoyHigh JoyLow JoyVeryLow JoyNone SurpriseVeryHigh SurpriseHigh SurpriseLow SurpriseVeryLow SurpriseNone AngerVeryHigh AngerHigh AngerLow AngerVeryLow AngerNone SadnessVeryHigh SadnessHigh SadnessLow SadnessVeryLow SadnessNone FearVeryHigh FearHigh FearLow FearVeryLow FearNone

{} {a5} {a4} {a2} {a1, a3} {a4, a5} {a3} {} {a1, a2} {} {} {} {} {a4} {a1, a2, a3, a5} {a1} {a2} {a3} {a4} {a5} {a1} {a2, a3} {} {a4, a5} {} {} {a1} {a2, a3} {} {a4, a5}

{a1, a2, a3, a4, a5} {a1, a2, a3, a4} {a1, a2, a3, a5} {a1, a3, a4, a5} {a2, a4, a5} {a1, a2, a3} {a1, a2, a4, a5} {a1, a2, a3, a4, a5} {a3, a4, a5} {a1, a2, a3, a4, a5} {a1, a2, a3, a4, a5} {a1, a2, a3, a4, a5} {a1, a2, a3, a4, a5} {a1, a2, a3, a5} {a4} {a2, a3, a4, a5} {a1, a3, a4, a5} {a1, a2, a4, a5} {a1, a2, a3, a5} {a1, a2, a3, a4} {a2, a3, a4, a5} {a1, a4, a5} {a1, a2, a3, a4, a5} {a1, a2, a3} {a1, a2, a3, a4, a5} {a1, a2, a3, a4, a5} {a2, a3, a4, a5} {a1, a4, a5} {a1, a2, a3, a4, a5} {a1, a2, a3}

360

5.3. The MEKA multi-label methods suite

361

As we have stated in previous sections, in order to deal with multi-target characteristic of the corpus, we need algorithms like those mentioned in Section 2.3. Some multi-label methods (Boutell et al., 2004; Read et al., 2011; Tsoumakas and Vlahavas, 2007) have been implemented in MEKA, a suite of multi-label methods (Read et al., 2016). MEKA is an extension of the popular Machine Learning suite WEKA (Frank et al., 2016), hence, it can use all the features and resources available in WEKA, like feature selection methods, binary and multi-class classifiers and clustering methods. There are almost 40 available methods for multi-label classification in MEKA12; most of them are variations of the algorithms aforementioned13. The main importance of this tool is that multi-label algorithms can be adapted to work with multi-target data14. Adaptation of these methods is possible because a multi-target problem can be casted to multi-label problems, just as multi-class problems can be casted to binary problems. An example of this is the adaptation of the Binary Relevance method (Boutell et al., 2004). In Table 9 we show the result of applying the Binary Relevance transformation to the multi-target instances of Table 8. As can be seen, this adapted method creates a binary classifier for each combination of emotion and its intensity value; hence, 30 binary classifiers are generated. A news article is considered a positive example if its tuple includes the expected combination of emotion and emotion intensity, while it is considered negative otherwise. For instance, the classifier LoveNone has as positive examples the articles a1 and a3, both articles appear in Table 8 with an intensity value of None in the emotion Love, while the negative examples are a2, a4 and a5 because in these articles the combination does not appear. Taking advantage of the availability of multi-target algorithms implemented in MEKA, we have decided to use this tool to perform some experiments with the created corpus. All the available multi-target classifiers in MEKA

362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379

12 13 14

http://meka.sourceforge.net/methods.html For more details, refer to (Herrera et al., 2016). http://meka.sourceforge.net/api-1.9/meka/classifiers/multitarget/MultiTargetClassifier.html

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

JID: YCSLA

18

ARTICLE IN PRESS

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx Table 10 Selected multi-target transformation methods. Acronym

Method

MTBR MTCCh MTBCCh MTPLPw MTRAkEL EMTCCh EMTPLPw

Multi-target version of the Binary Relevance method Multi-target version of Classifier Chains Multi-target version of Bayesian Classifier Chains Multi-target version of Pruned Label Powerset Multi-target Version of RAkEL Ensembles of multi-target Classifier Chains Ensembles of multi-target Pruned Label Powerset

390

follow the problem transformation approach. Each transformation considers different information during the process. Table 10 shows the selected multi-target transformation methods used in the experiments. Once the multi-target problem has been transformed to a binary or multi-class problem, according to the applied method, a proper classifier is used to train a model. There are plenty of binary and multi-class classifiers, but we have decided to use an SVM, based on the results this classifier has obtained in the related works described in Section 2. WEKA has an implementation of the SMO solving optimization algorithm (Platt, 1998) and based on empirical evidence, the radial base kernel was selected with a regularization parameter equal to 1. For the experiments we separated the instances of the corpus into two sets, a set for training and a set for testing. The experiments were run ten times, each run randomly selected 90% of the instances for training and 10% for testing. The distribution of the tagged emotions in both sets follows the trend shown in Fig. 3. At the end of the iterations the average result of the experiments was calculated.

391

5.4. Evaluation metrics

392

There are traditional metrics used to evaluate the performance of classifiers. Accuracy is one of the most used because this metric determines how many instances were correctly classified from all the predictions the classifier made. Precision, Recall and F-measure are other well-known metrics, but when it comes to multi-label and multi-target classification, the use of these metrics is not straightforward—The huge number of possible class predictions makes traditional metrics too strict and the information they provided may ignore some good predictions classifiers obtain. According to the calculation we made for our multi-target corpus, there are 15,625 different emotional reactions’ tuples an instance can have (see Section 4.1). The trained model would have 1 out of 15,625 chances to generate a correct prediction, while a binary classifier has 1 out of 2 or a 5-classes multi-class classifier would have 1 out of 5. Despite the difficulty to generate exact predictions, the model would be able to make partial good prediction. We implemented a metric that measures the emotional reactions similarity (ERS) in the predicted tuples. This metric can have values in the range [01]. Similarity is 1 when all the predicted values of the tuples are the expected ones (an exact match). On the other hand, when the value is 0, the difference between the expected and predicted values is the largest that can be (value Very High was expected but None was predicted or vice versa). To calculate the difference between the expected values e and the predicted ones p, we assigned the following numerical values to the tags used to indicate the emotion intensities: Very High = 1, High = 0.75, Low = 0.5, Very Low = 0.25 and None = 0. The ERS for a news article a with n emotions is calculated using Eq. (1). Pn j¼1 jej pj j ð1Þ ERSðaÞ ¼ 1 n

380 381 382 383 384 385 386 387 388 389

393 394 395 396 397 398 399 400 401 402 403 404 405 406 407

408 409

410 411 412 413

The overall performance of the system for m news articles is calculated by Eq. (2). Pm ERSðai Þ ERStot ¼ i¼1 m

ð2Þ

For an example of this metric, see Table 11. The graphs clearly show the differences between each expected value and the predicted one. For instance, in news article a1 both the expected and the predicted value for Love was the same value None, so there is no difference at all. But for the emotion Joy the expected value was Very Low and the Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

ARTICLE IN PRESS

JID: YCSLA

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

19

Table 11 Example of instances for multi-target classification (N=None, VL=Very Low, L=Low, H=High, VH=Very High). News article a

Expected tuple

Predicted tuple

Graphic of emotional reactions

ERS(a)

a1

(N, VL, N, VH, VH, H)

(N, L, H, VH, VL, VH)

0.66

a2

(VL, VL, N, H, H, L)

(N, H, VL, VH, VH, L)

0.75

a3

(N, H, N, L, H, L)

(VL, L, L, N, H, VL)

0.79

a4

(L, VH, VL, VL, VL, N)

(H, H, N, H, VL, L)

0.87

a5

(H, VH, N, N, VL, N)

(H, VH, N, VL, VL, N)

1 (continued)

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

ARTICLE IN PRESS

JID: YCSLA

20

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx Table 11 (Continued) News article a

Expected tuple

Predicted tuple

Graphic of emotional reactions

ERS(a)

0.814

ERStot

421

predicted one was Low, so there is a difference of j0:250:5j ¼ 0:25; with a final ERS for this article of 0.66 and an overall value of 0.814 for the five articles. Although our problem is multi-target, we have decided to use Hamming Loss metric, which has been widely used for multi-label problems (see Section 2.3). Hamming Loss calculates the fraction of labels that are incorrectly predicted. The interval values of Hamming Loss are between 0 and 1, in which 0 means a perfect prediction and 1 means that all predicted values were incorrect. Compared to the emotional distribution precision metric, Hamming Loss does not consider the distance between the predicted value and the real values, therefore we consider Hamming Loss as a coarse-grained metric.

422

5.5. Results

423

To compare the results obtained by the multi-target methods, we have defined a baseline. Due to the imbalance of the corpus (see Section 4.3), we know in advance that most of the tuples have high intensity values in emotions like Anger and Sadness, while low values are expected in Surprise and Love. Considering this information, the baseline is composed by the most frequent intensity values of each emotion in the different newspapers. The following list shows the baselines defined for each newspaper, considering that the order of emotions in tuples is (Love, Joy, Surprise, Anger, Sadness, Fear).

414 415 416 417 418 419 420

424 425 426 427 428 429 430 431 432

433 434 435 436

   

El Universal: (None, Very Low, None, Very High, High, Low). Excelsior: (None, Very Low, None, Very High, Very High, Low). La Jornada: (None, None, None, Very High, Very High, Low). All articles: (None, None, None, Very High, Very High, Low).

Although this baseline might seem as a kind of D14X X“emotional signature” of each newspaper, we consider that the number of news articles we collected is not a representative sample to state that a newspaper specifically targets a certain group of readers, seeking to provoke in them a particular emotional reaction. It is worth doing this analysis as future work, as well as a study of the main characteristics promoted by a particular emotion. Table 12 Emotional reactions similarity of the multi-target methods with bag of words model. Newspaper

Multi-target methods

El Universal Excelsior La Jornada All

MTBR 0.871 0.899 0.911 0.881

MTCCh 0.873 0.901 0.912 0.882

Baseline MTBCCh 0.875 0.899 0.911 0.882

MTPLPw 0.863 0.891 0.907 0.871

MTRAkEL 0.869 0.898 0.915 0.871

EMTCCh 0.870 0.903 0.910 0.882

EMTPLPw 0.872 0.901 0.910 0.882

0.876 0.902 0.916 0.872

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

ARTICLE IN PRESS

JID: YCSLA

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

21

Table 13 Emotional reaction similarity of the multi-target methods with word embedding. Newspaper

Multi-target methods

El Universal Excelsior La Jornada All

MTBR 0.869 0.907 0.907 0.892

MTCCh 0.870 0.904 0.907 0.891

Baseline MTBCCh 0.869 0.905 0.907 0.889

MTPLPw 0.864 0.902 0.903 0.886

MTRAkEL 0.870 0.908 0.910 0.886

EMTCCh 0.870 0.906 0.906 0.891

EMTPLPw 0.870 0.906 0.909 0.891

0.870 0.912 0.910 0.886

439

In Tables 12 and 13 we show the results of the emotional reactions similarity using bag of words and word embedding features respectively. Tables 14 and 15 show the results of hamming loss using also the word bag and word embedding features.

440

5.6. Discussion

441

In Tables 12 and 13 we can see, in general terms, that the multi-target methods obtained good results for all newspapers, considering that for the proposed metric (ERS) a higher value (closer to 1) means better performance. The results of these methods and the baseline were very close, and for some specific newspapers like El Univeral and La Jornada, no multi-target method was able to overpass the baseline. It is important to mention that in the experiments performed with all the articles, the multi-target methods, regardless of text representation, overpass the baseline. These results seem to suggest that increasing the size of the corpus can lead to improve the classification performance. With respect to the Hamming Loss metric, we can see in Tables 14 and 15 that for both text representations and for all the newspapers, the multi-target methods overpass the baseline, but the best results were also very close to the baseline. The small difference between the results obtained by the proposed methods and the baseline is mainly due to imbalance in classes shown in Figs. 69. Imbalance in classes causes some emotion intensities to occur much more frequently than others. This situation can be seen more clearly in Fig. 10 that shows the top 36 most frequent emotional reactions’ tuples in all the articles. Considering that there are 288 news articles in this corpus, the 10 most frequent tuples appear in half the articles, while the other 15,615 tuples have very few or no occurrences in the remaining articles. The high occurrences of some tuples caused that the classifier (SVM) to learn from this examples and predicted this tuples more frequently, and these predictions are very similar to the tuples defined in the baseline (see Section 5.5). There are techniques for balancing class distribution, like over-sampling and under-sampling that have been successfully applied in binary and multi-class classification (Chawla et al., 2002). But multi-label and especially multi-target datasets have extreme imbalance levels, and the aforementioned techniques cannot be easily applied because there is not a single majority label and a single minority one, but several of them in each group (Herrera et al., 2016). Further research is needed for balancing this kind of datasets, but this is out of the scope of this work. To sum up, we can say that the proposed methods were able to correctly predict most of the emotional reactions’ tuples. The importance of predicting this kind of tuples is due to the fact that this information can be used to tackle the tasks defined for determining emotions from reader’s perspective (see Section 2), namely the most tagged emotion—cf. (Lin et al., 2007), the ranking of emotions—cf. (Lin et al., 2008) and the set of tagged emotions— cf. (Bhowmick, 2009).

437 438

442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467

Table 14 Hamming loss of the multi-target methods with bag of words model. Newspaper

Multi-target methods

El Universal Excelsior La Jornada All

MTBR 0.390 0.346 0.450 0.453

MTCCh 0.388 0.349 0.448 0.449

Baseline MTBCCh 0.387 0.348 0.451 0.454

MTPLPw 0.451 0.349 0.448 0.489

MTRAkEL 0.370 0.388 0.443 0.462

EMTCCh 0.451 0.349 0.448 0.489

EMTPLPw 0.388 0.349 0.446 0.459

0.581 0.381 0.490 0.477

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

ARTICLE IN PRESS

JID: YCSLA

22

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx Table 15 Hamming loss of the multi-target methods with word embedding. Newspaper

Multi-target methods

El Universal Excelsior La Jornada All

MTBR 0.381 0.349 0.458 0.453

MTCCh 0.381 0.349 0.458 0.466

Baseline MTBCCh 0.381 0.349 0.461 0.453

MTPLPw 0.450 0.378 0.453 0.482

MTRAkEL 0.385 0.349 0.446 0.453

EMTCCh 0.450 0.378 0.453 0.482

EMTPLPw 0.381 0.349 0.446 0.452

0.581 0.381 0.490 0.477

Fig. 10. Frequency of emotional reactions’ tuples for tweets in all newspapers. (N=None, VL=Very Low, L=Low, H=High, VH=Very High).

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

ARTICLE IN PRESS

JID: YCSLA

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

[m3+;April 10, 2019;12:11]

23

468

6. Conclusions and future work

469

487

In this work we have proposed the idea of predicting the emotional reactions of Twitter users to news articles. To accomplish this objective, a corpus of Spanish news articles and tweet responses was collected and annotated with the emotions that users expressed. These annotated emotions were transformed to emotional reactions’ tuples following a discretization process. The prediction task was defined as a multi-target classification problem, due to the possibility that users can evoke more than one emotion after reading an article, and these emotions can express different degrees of intensities. Multi-target methods adapted from multi-label methods were used to transform the problem to binary or multi-class classification problems. After that, an SVM was used to predict the emotional reactions of Twitter users. A specific metric was created to evaluate the performance of the classifier. Results show that a high emotional reactions similarity was obtained by the multi-target methods and these results seem promising for this novel task. Despite that, the results of the baseline were very close and sometimes better than to the ones obtained by the proposal. Due to the high imbalance of the distribution of emotions, there is a bias in the prediction the classifier made, therefore low frequent emotional intensities were ignored. In order to improve the performance, rare examples need to be carefully supplied with deliberate attention to the classifier. In spite of that, we observed that multi-target methods are able to predict emotional reactions of Twitter users’ responses better than fixed reactions under certain circumstances. To deal with class imbalance, we propose as a future work to collect more news articles with information that may provoke positive emotions in readers (work in progress) and also increase the size of the corpus in general. Other multi-target methods and classifiers will be used to improve results. In addition to that, feature selection techniques and parameter adjustments will be further explored.

488

References

489

Balahur, A., Perea-Ortega, J.M., 2013. Experiments using varying sizes and machine translated data for sentiment analysis in Twitter. In: Proceedings of the TASS Workshop at SEPLN 2013. Bandari, R., Asur, S., Huberman, B.A., 2012. The pulse of news in social media: forecasting popularity. In: Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media. The AAAI Press, pp. 26–33. Bennett, E.M., Alpert, R., Goldstein, A.C., 1954. Communications through limited-response questioning. Public Opin. Q. 18 (3), 303. doi: 10.1086/266520. Bhowmick, P.K., 2009. Reader perspective emotion analysis in text through ensemble based multi-label classification framework. Comput. Inf. Sci. 2 (4), 64–74. Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022. Boutell, M.R., Luo, J., Shen, X., Brown, C.M., 2004. Learning multi-label scene classification. Pattern Recognit. 37 (9), 1757–1771. Calvo, H., Gambino, O.J., 2017. News articles with annotated emotional reaction distribution from Twitter in Spanish. Data Brief (Submitted for publication).X X Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357. Cohen, J., 1960. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20 (1), 37–46. doi: 10.1177/001316446002000104. Davies, M., Fleiss, J.L., 1982. Measuring agreement for multinomial data. Biometrics 38 (4), 1047–1051. Ekman, P., Friesen, W.V., O’Sullivan, M., Chan, A., Diacoyanni-Tarlatzis, I., Heider, K., Krause, R., LeCompte, W.A., Pitcairn, T., Ricci-Bitti, P.E., Scherer, K., Tomita, M., Tzavaras, A., 1987. Universals and cultural differences in the judgments of facial expressions of emotion. J. Pers. Soc. Psychol. 53 (4), 712–717. Figueroa, J.G., Gonzalez, E.G., Solıs, V.M., 1976. An approach to the problem of meaning:semantic networks. J. Psycholinguist. Res. 5 (2), 107– 115. Frank, E., Hall, M., Witten, I., 2016. The WEKA workbench. Data mining: practical machine learning tools and techniques 4 .X X Gambino, O.J., Calvo, H., 2016. A comparison between two Spanish sentiment lexicons in the Twitter sentiment analysis task. Advances in Artificial Intelligence - IBERAMIA 2016. Springer International Publishing, pp. 127–138. Herrera, F., Charte, F., Rivera, A.J., Del Jesus, M.J., 2016. Multilabel Classification: Problem Analysis, Metrics and Techniques. Springer. 1 edition. Kwak, H., Lee, C., Park, H., Moon, S., 2010. What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web. ACM, pp. 591–600. Landis, J.R., Koch, G.G., 1977. The measurement of observer agreement for categorical data. Biometrics 33 (1), 159–174. Le, Q., Mikolov, T., 2014. Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, pp. 1188–1196. Levy, O., Goldberg, Y., Dagan, I., 2015. Improving distributional similarity with lessons learned from word embeddings. Trans. Assoc. Comput. Linguist. 3, 211–225. Li, X., Peng, Q., Sun, Z., Chai, L., Wang, Y., 2017. Predicting social emotions from readers’ perspective. IEEE Trans. Affect. Comput. PP (99).

470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486

490 491 492 493 494 495 496 497 498 499

Q4 500 501 502 503 504 505 506 507 508 509

Q5 510 511 512 513 514 515 516 517 518 519 520 521

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004

JID: YCSLA

24 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537

Q6 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561

ARTICLE IN PRESS

[m3+;April 10, 2019;12:11]

O.J. Gambino and H. Calvo / Computer Speech & Language xxx (2019) xxx-xxx

Lin, K.H.-Y., Yang, C., Chen, H.-H., 2007. What emotions do news articles trigger in their readers? In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, pp. 733–734. Lin, K.H.-Y., Yang, C., Chen, H.-H., 2008. Emotion classification of online news articles from the reader’s perspective. In: Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, 2008., 1. IEEE, pp. 220–226. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J., 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, pp. 3111–3119. Padr o, L., Stanilovsky, E., 2012. FreeLing 3.0: towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference. ELRA, Istanbul, Turkey. Pak, A., Paroubek, P., 2010. Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA). Pang, B., Lee, L., et al., 2008. Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2 (12), 1–135. Pennington, J., Socher, R., Manning, C., 2014. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Perez-Corona, N., Hernandez-Colın, D., Bustillo-Hernandez, C., Figueroa-Nazuno, J., 2012. Model of natural semantic space for ontologies’ construction. Int. J. Comb. Optim. Probl. Inform. 3 (2), 93–108. Perez-Rosas, V., Banea, C., Mihalcea, R., 2012. Learning sentiment lexicons in Spanish. In: Proceedings of the LREC. Platt, J., 1998. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Technical Report .X X Ramırez, G., Villatoro, E., Ionescu, B., Escalante, H.J., Escalera, S., Larson, M., Henning, M., Guyon, I., 2019. Overview of the multimedia information processing for personality & social networks analysis contest. In: Proceedings of the ICPR 2018 International Workshops, Revised Selected Papers, Lecture Notes in Computer Science 11188. Springer. Rangel, I.D., Guerra, S.S., Sidorov, G., 2014. Creacion y evaluacion de un diccionario marcado con emociones y ponderado para el espae nol. Onomazein 29, 31–46. doi: 10.7764/onomazein.29.5. Rao, Y., 2016. Contextual sentiment topic model for adaptive social emotion classification. IEEE Intell. Syst. 31 (1), 41–47. Read, J., Pfahringer, B., Holmes, G., Frank, E., 2011. Classifier chains for multi-label classification. Mach. Learn. 85 (3), 333. Read, J., Reutemann, P., Pfahringer, B., Holmes, G., 2016. MEKA: a multi-label/multi-target extension to Weka. J. Mach. Learn. Res. 17 (21), 1–5. Richens, R.H., 1958. Interlingual machine translation. Comput. J. 1 (3), 144–147. Rong, X., 2014. word2vec parameter learning explained. 1411.2738. Scott, W.A., 1955. Reliability of content analysis: The case of nominal scale coding. Public Opin. Q. 19 (3), 321–325. Shaver, P., Schwartz, J., Kirson, D., O’connor, C., 1987. Emotion knowledge: further exploration of a prototype approach. J. Pers. Soc. Psychol. 52 (6), 1061. Tsoumakas, G., Vlahavas, I., 2007. Random k-labelsets: an ensemble method for multilabel classification. In: Proceedings of the European Conference on Machine Learning. Springer, pp. 406–417. Xu, R., Ye, L., Xu, J., 2013. Reader’s emotion prediction based on weighted latent Dirichlet allocation and multi-label k-nearest neighbor model. J. Comput. Inf. Syst. 9 (6), 2209–2216. Zhang, D., Xu, H., Su, Z., Xu, Y., 2015. Chinese comments sentiment classification based on word2vec and SVMperf. Expert Syst. Appl. 42 (4), 1857–1863. Zhang, M.L., Zhou, Z.H., 2005. A k-nearest neighbor based algorithm for multi-label classification. In: Proceedings of the 2005 IEEE International Conference on Granular Computing, 2, pp. 718–721. doi: 10.1109/GRC.2005.1547385. Zhang, Y., Su, L., Yang, Z., Zhao, X., Yuan, X., 2015. Multi-label emotion tagging for online news by supervised topic model. In: Proceedings of the Asia-Pacific Web Conference. Springer, pp. 67–79.

Please cite this article as: O. Gambino, H. Calvo, Predicting emotional reactions to news articles in social networks, Computer Speech & Language (2019), https://doi.org/10.1016/j.csl.2019.03.004