Accepted Manuscript Title: SentiHealth-cancer: A sentiment analysis tool to help detecting mood of capatients in online social network Author: Ramon Gouveia Rodrigues Rafael Marques das Dores Celso G. Camilo-Junior Thierson Couto Rosa PII: DOI: Reference:
S1386-5056(15)30042-3 http://dx.doi.org/doi:10.1016/j.ijmedinf.2015.09.007 IJB 3243
To appear in:
International Journal of Medical Informatics
Received date: Revised date: Accepted date:
12-12-2014 28-9-2015 30-9-2015
Please cite this article as: Ramon Gouveia Rodrigues, Rafael Marques das Dores, Celso G.Camilo-Junior, Thierson Couto Rosa, SentiHealth-cancer: A sentiment analysis tool to help detecting mood of capatients in online social network, International Journal of Medical Informatics http://dx.doi.org/10.1016/j.ijmedinf.2015.09.007 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
SentiHealth-Cancer: a Sentiment Analysis tool to Help Detecting Mood of Cancer Patients in Online Social Network * Ramon Gouveia Rodriguesa
[email protected], Rafael Marques das Doresa, Celso G. Camilo-Juniora, Thierson Couto Rosaa aInstituto
de Informática, Universidade Federal de Goiás, PO Box 131, CEP 74001-970, Brazil
Corresponding author at: Instituto de Informática, Universidade Federal de Goiás, PO Box 131, CEP 74001-970, Brazil.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
Highlights Hashtags and emoticons is helpful to the Sentiment Analysis (SA) of patients. The SA helps to identify the mood of authors when themselves are the target. Proposed SentiHealth identifies the mood of the people in the disease context. Proposed SentiHealth-Cancer helps to monitor the mood of people related to cancer.
Abstract Background: Cancer is a critical disease that affects millions of people and families around the world. In 2012 about 14.1 million new cases of cancer occurred globally. Because of many reasons like the severity of some cases, the side effects of some treatments and death of other patients, cancer patients tend to be affected by serious emotional disorders, like depression, for instance. Thus, monitoring the mood of the patients is an important part of their treatment. Many cancer patients are users of online social networks and many of them take part in cancer virtual communities where they exchange messages commenting about their treatment or giving support to other patients in the community. Most of these communities are of public access and thus are useful sources of information about the mood of patients. Based on that, Sentiment Analysis methods can be useful to automatically detect positive or negative mood of cancer patients by analyzing their messages in these online communities. Objective: The objective of this work is to present a Sentiment Analysis tool, named SentiHealth-Cancer (SHC-pt), that improves the detection of emotional state of patients in Brazilian online cancer communities, by inspecting their posts written in Portuguese language. The SHC-pt is a sentiment analysis tool which is tailored specifically to detect positive, negative or neutral messages of patients in online communities of cancer patients. We conducted a comparative study of the proposed method with a set of general-purpose sentiment analysis tools adapted to this context. Methods: Different collections of posts were obtained from two cancer communities
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
in Facebook. Additionally, the posts were analyzed by sentiment analysis tools that support the Portuguese language (Semantria and SentiStrength) and by the tool SHCpt, developed based on the method proposed in this paper called SentiHealth. Moreover, as a second alternative to analyze the texts in Portuguese, the collected texts were automatically translated into English, and submitted to sentiment analysis tools that do not support the Portuguese language (AlchemyAPI and Textalytics) and also to Semantria and SentiStrength, using the English option of these tools. Six experiments were conducted with some variations and different origins of the collected posts. The results were measured using the following metrics: precision, recall, F1-measure and accuracy Results: The proposed tool SHC-pt reached the best averages for accuracy and F1measure (harmonic mean between recall and precision) in the three sentiment classes addressed (positive, negative and neutral) in all experimental settings. Moreover, the worst accuracy value (58%) achieved by SHC-pt in any experiment is 11.53% better than the greatest accuracy (52%) presented by other addressed tools. Finally, the worst average F1 (48.46%) reached by SHC-pt in any experiment is 4.14% better than the greatest average F1 (46.53%) achieved by other addressed tools. Thus, even when we compare the SHC-pt results in complex scenario versus others in easier scenario the SHC-pt is better. Conclusions: This paper presents two contributions. First, it proposes the method SentiHealth to detect the mood of cancer patients that are also users of communities of patients in online social networks. Second, it presents an instantiated tool from the method, called SentiHealth-Cancer (SHC-pt), dedicated to automatically analyze posts in communities of cancer patients, based on SentiHealth . This context-tailored tool outperformed other general-purpose sentiment analysis tools at least in the cancer
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
context. This suggests that the SentiHealth method could be instantiated as other disease-based tools during future works, for instance SentiHealth-HIV, SentiHealth-Stroke and SentiHealth-Sclerosis. © 2015 Published by Elsevier Ltd. Keywords: Sentiment Analysis; Opinion Mining; Online Social Networks; Facebook; Cancer
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
1. Introduction
Sentiment Analysis (SA) is widely used to analyze opinions from people about a target, for example a product or a service. Existing SA techniques can be divided into three categories, depending on the level at which the analysis is made in the text [1, 2, 3, 4]: document level, sentence level and entity/aspect level. At the document level, the opinion in a document is classified as positive, negative or neutral. In this type of analysis it is not possible to classify a document that covers more than one entity, because each document is interpreted as having a text referencing just a single entity [5]. Differently of that, the analysis in sentence level classifies an opinion into three classes: positive, negative or neutral, and each sentence of the document is analyzed separately [5]. Both analysis in the document level and sentence level use only the language constructs to classify an opinion. However, the analysis in the entity and aspect level considers that for every opinion there is a target. Therefore, seeks to identify the target of each existing opinion in the text. This allows to analyze more than one opinion in a same sentence [5]. For example, the phrase “Although a bad service, I still like that restaurant.” have more positive opinion than negative about the restaurant, but it has in fact two aspects evaluated: the service offered and the restaurant itself. These are the targets of the opinion. Some works were done about SA, especially comparing the tools proposed. The work presented in [6] makes the comparison of nine SA tools: AchemyAPI, Lymbix, MLAnalyzer, Repustate, Semantria, Sentigem, Skytle, Textalytics and Textprocessing. To calculate the accuracy of each tool, texts of different sources were collected (news, comments and
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
tweets). The tools with the greatest accuracy were Textalytics (75%), Skytle (73%) and Semantria (68%). In other work, [7], SA tools were also compared. Twenty tools were chosen: fifteen stand-alone SA tools (SentiStrength, Chatterbox, Sentiment140, Textalytics, Intridea, AiApplied, ViralHeat, Lymbix, SentimentAnalyzer, TextProcessing, Semantria, uClassify, MLAnalyzer, Repustate and a last one referred to as Anonymous by the authors1 and five workbench tools (BPEF, Lightside, FRN, EWGA, RapidMiner). The texts used to analyze
the
tools
were
tweets
related
to
the
themes:
telecommunications,
pharmaceutical, security, technology and consumer products at retail. Among the standalone tools, the one with the greatest average accuracy (67%) was the SentiStrength. Among the workbench tools, BPEF presented best average accuray (71%). Another study is [8]. This tested AlchemyAPI, OpenAmplify and Texterra. The last one was a new tool proposed by the article and presented accuracy of 79%, higher than the AlchemyAPI (42%) and OpenAmplify (57%). In the study presented in [8], experiments using texts written in English and Russian were conducted. The texts in English are of general affairs, political and reviews of movies. The texts in Russian are comments about movies, books and cameras. The tools AlchemyAPI, Semantria, SentiStrength and Textalytics are used in the experiments reported in this work because they presented good results in related work. Also, they provide access to the API (Application Programming Interface) Java [9] (which facilitates the integration in one application) and they allow to analyze arbitrary texts. The Texterra tool, which according to [8] is more accurate than others, was not considered in this article because it presented access failures during the tests. Although there are several studies about SA, few scientific studies use SA to classify a
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
person emotional state considering the person himself as the target of the analysis. However, it is possible to know if a person has more positive or negative thoughts by analyzing his texts [10]. For example, if most of texts are negatives in a window of time, this person probably is in a negative emotional state. Unlike [6, 7, 8], this article considers texts written in Portuguese, extracted from posts appearing in Facebook communities of cancer patients. The article regards the authors of the texts as the targets of the analysis and uses SA solutions to classify the sentiment of the authors.
1.1. Rationale for the study
According to [11], SA techniques applied on posts in cancer online communities may be used not only to detect pessimistic emotional state but also to detect changes in a person mood as consequence of his interactions with other patients in a community. Who writes a text without the specific purpose of reporting their emotional state may end up revealing, unintentionally, if he is more positive or negative. This can be used to give emotional support to patients. A chronic patient may face various difficulties such as physical pain, stress, extreme anxieties, anger, depression, and frustration [12]. These difficulties could cause suffering during the treatment and, even, take the patient to interrupt his treatment.
1The authors inform that due to terms of use restrictions they can not reveal the name of the software tool.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
So, many patients seek support in social networks to obtain information, encouragement, motivation, feedback, emotional support, tangible support, and network support exchanged among peers [13, 14, 15, 16, 17, 18, 19] to have a better quality of life during their treatment. Thus, the automatic analysis of patients’ mood can be very useful for assistants, family and patients themselves. SA methods are good options for this analysis. However, most works of SA assess opinions on an a target which is different from the author of the emitted opinion [20, 21, 22, 23, 24]. In addition, few studies focus on analyzing sentiment of cancer patients and their families, who go through strong experiences too and usually, are influenced emotionally by the context surround the patient [11, 25]. Finally, there are few works proposing context-driven SA solutions aiming at improving accuracy of the classification result [26, 27, 28]. In most cases, the techniques are generalist and as such do not perform well on specific contexts.
Thus, some research questions can be listed:
• What is the performance of the tools AlchemyAPI, Semantria, SentiStrength and
Textalytics to analyze the authors emotional state of messages in online cancer communities? • Does the use of specific information of the cancer field and forms of communication
on the Internet improve the accuracy of a lexical approach method of SA? • Does the origin of the analyzed groups’ messages influence the accuracy of SA?
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
1.2. Objectives of the study
The objective of this work is to present a SA tool, named SentiHealth-Cancer (SHCpt), that improves the detection of emotional state of patients in Brazilian online cancer communities, by inspecting their posts written in Portuguese language.
2. Study context
Cancer, according to American Cancer Society, is the name given to a set of more than 100 diseases that have in common the uncontrolled growth of (malignant) cells that invade tissues and organs and can spread (metastasize) to other parts of the body [29]. The World Health Organization (WHO) estimates that in the year 2012 there were 14.1 million new cancer cases and 8.2 million deaths due to this disease in the world [30]. Specifically in Brazil, according to the National Cancer Institute José Alencar Gomes da Silva (INCA), approximately 576.580 new cases were expected for the year 2014. It is estimated the number of new cancer cases will be 21.4 million and the number of deaths due to cancer will be 13.2 million in the country by the year 2030 [31]. According to [32], among the most common side effects during cancer treatment are fatigue, sleep problems, depression and disorder. These symptoms cause significant disruption in the patient’s quality of life and may have implications on treatment adherence. On the other hand, a social network can help establish positive links that contribute to treatment of the disease and is useful for exchanging experience between people with the same disease [33]. For this reason, many people who are in treatment of chronic
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
diseases resort to groups or communities in social networks to feel better and to catch up with news about their diseases [34, 35, 36, 37, 38, 39]. As examples of benefits that the Internet can bring to patients, we have: the Internet is widely used by diabetics to seek information about the disease [40]; for women who want to know more information about breast health [41]. Some Web sites also help people who want to quit smoking [42]. Obese patients can get social support on the Internet to lose weight [15] and the Facebook social network is widely used by parents and caregivers of children with Autism Spectrum Disorders (ASDs) to seek social support [43]. However, there are several social networks that people can use to interact with each other. The most widely used of these is Facebook. In March 2015, the network contained 936 million active users, where an active member is one who has realized log-in in the social network in the last 28 days and also has at least one friend added to his network [44]. The world’s population, also in 2015, was approximately 7.3 billion people [45]. Thus, the total users in this social network is equivalent to 12% of world population. Restricting people with age 13 and older, this percentage is even greater. One of the factors that contribute to this large number of users on Facebook is the concept of “groups” or “communities”, where several people can interact among themselves to share content from a common theme. Because there are several “groups” or “communities” about chronic disease, this social network was chosen to be addressed in this work. The group concept is useful for this work by the fact that who are in a group, usually writes about issues related to it, thus, posts in a group tend to be very related to the subject adopted by the group. Another advantage of using Facebook is that it has a lot of users and it is very
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
active considering the content disseminated in it. This contributes to have many texts to be analyzed. Therefore, this work used support groups for cancer patients from Facebook to apply SA. Groups were selected for this specific disease because they have a greater number of active users than most other chronic diseases groups, which facilitates obtaining content to be used in the research. All groups surveyed are Brazilian and, consequently, users in these groups write their post in Portuguese language. Many other ways to collect data to analyze the emotional state of patients undergoing cancer treatment are possible. Some of them are: ethnographic observation, interviews and questionnaires, but these methods present certain difficulties [46]. One difficulty is the fact that interviews and observations require a very long time for them to be carried out, making these costly methods. In addition, patients who are happier try to provide information more easily than the unhappy ones. This can produce biased results [46]. Another difficulty is related to temporal granularity: collecting data in real time using these methods becomes extremely complicated as these collections are always dependent on user availability to provide information [46]. On the other hand, data collection directly from a social network can be performed easily, efficiently and continuously, using the programming interface provided by the social network. Besides, collecting data this way does not involve annoying patients with interviews or forms to fill out. 3. Methods
This work is part of a project evaluated and approved by the Ethics Committee under the number 31191214.7.0000.5083/UFG. Moreover, all data collected from online social network were published by the user as a public text in a public group. A Java application
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
was developed to connect with the social network’s API and collect posts from selected groups. The online social network selected by us as source of texts was the Facebook because it contains user groups in the theme of this work and also because it is one of most currently used online social network by both cancer patients and by their families. Two Facebook’s groups were selected based on volume of data and on public access of messages. The data were collected between March and May 2014. For SA of text involved in the cancer context and written in Portuguese, two SA tools were used that support the Portuguese language (SentiStrenght [47] and Semantria [48]). Since the number of accurate SA tools that support Portuguese is small, the collected posts were also translated automatically to English and were analyzed by SA tools that support English too. In addition, a new method for SA, called SentiHealth, was developed considering the author of the posts as the target of the analysis and internet’s Portuguese communication style. Moreover, the proposed tool SentiHealth-Cancer (SHC-pt) was instanced from SentiHealth method and compared to other tools. For that, the opinions of the three authors of this study were used to label manually the posts as positive, negative or neutral.
3.1. SA Tools
We used the proposed tool SHC-pt and other four SA tools developed in the Java language to analyze the sentiments of members of groups about cancer in Facebook: AlchemyAPI 1.1.4v [49], Semantria 3.0.67v [48], SentiStrength 0.1V [47] and Textalytics 1.2v [50].
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
3.1.1. AlchemyAPI
The AlchemyAPI enables SA, but is not restricted to this. It also allows to extract image informa- tion from URL, to identify languages, to identify entities and to apply other Natural Language Processing techniques [51]. It has support to English and German languages and can be used by applications programmed with Java, C/C++, C#, Perl, PHP, Python, Ruby, Javascript and can also be used by Android OS applications [51]. Despite its documentation does not have a complete description of how the SA is made, AlchemyAPI enables, through parameters, to perform SA in document, entity or word-level.
This analysis can be done with text documents or texts from websites
publicly accessible such as blog posts, forums, news articles, tweets, Facebook posts and product reviews. The results of the analysis may be output in XML, JSON or RDF formats [49]. A free version is available that allows, through the use of a limited access token, to perform up to a thousand daily transactions and five concurrent requests. [51].
3.1.2. SentiStrength
The SentiStrength is an API made in Java and uses several different techniques simultaneously to extract strength of positive and negative sentiments of texts in digital media. Among the techniques used there are: a list of words with its sentiment strength, spelling correction, booster words that alter strength, negation words that invert emotions, repeated letters
that boost sentiment,
emoticons
list,
repeated punctuations that
increase or decrease the sentiment and negative emotion ignored in questions [52].
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
SentiStrength supports the languages: English, Finnish, German, Dutch, Spanish, Russian, Portuguese, French, Arabic, Polish, Persian, Swedish, Greek, Welsh, Italian, Turkish. It can be used with the following programming languages: Java, Python and Ruby [52]. When analyzing a text lexically, the SentiStrength separates each word of the text, and if one of these words match any term of its term list, the corresponding score in the term list is assigned to the term in the text. After all words of the text have being considered, the most positive score and the most negative score in the text are summed and the result is used to classify the text. The text is classified according to the sign of the result as positive or negative. If result equals to zero the text is classified as neutral. The texts decoding procedures and the unconventional methods used to give strength to words make SentiStrength to present better results than standard methods of machine learning (simple logistic regression, SVM, SMO, classification tree J48, classifier based on rules JRip, AdaBoost, Multilayer Perceptron, Nave Bayes) [53]. The SentiStrength is free for academic research and its files used in SA are separately provided, what makes its customization to be easy. This enables us to add, remove, or translate the terms in these files in order to have an analysis with a better result. A complimentary desktop software for Windows and an online version are also provided [47].
3.1.3. Semantria
The Analysis tools used in Semantria allow to search in texts, to classify content, to recognize entities, to remove terms in a blacklist and to identify sentiments in sentences
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
[54]. This tool supports the English, French, Portuguese, Spanish, German, Mandarin and Italian languages. It provides many libraries for several programming languages: C++, Java, PHP, .NET, Python, Ruby and JS. Its free version allows up to ten thousand transactions [55]. A logarithmic scale is used by Semantria to classify sentiments in texts. Unlike linear scale where the difference between two consecutive values is a constant, in the logarithmic scale the next value has an exponential increase [55]. Two different scores are used by Semantria and they depend on the target object for SA. These scores may be for documents or components. Components are themes, topics and entities that receive a score from -10 to 10. While the document score has a range of -2 to 2. To determine whether the sentiment in the document is neutral it must have a score higher than -0.45 and smaller than 0.5. While for a component, the score should vary between -0.05 and 00.22 for it to be neutral [55]. Semantria detects the size of a text automatically and performs SA according to the document size. In [55] it is stated that Semantria can achieve an accuracy between 60 and 65% in short texts and 70-75% in long texts. Semantria allows for processing a queue of documents, each document being processed independently. The requests should contain XML or JSON objects and can have three parameters: document id, document text and an optional tag in the POST request [54]. Being a paid tool, the boundaries of requests depend on the plan purchased by the user, ranging from $ 749 to $ 2.999 USD per month [56].
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
3.1.4. Textalytics
Textalytics, besides of identifying the polarity of the sentiment of a text, also offers other services like identification of: the target of the sentiment, text theme, intention or desire to buy a product, the author profile and the relationship of certain people with certain companies [50]. The supported languages depend on the service being used. For SA, the languages English, Spanish and French are supported. Textalytics can be used with Java, PHP, Python and Visual Basic programming languages. In addition, Textalytics has an Excel extension that enables text semantic analysis using spreadsheets. The analysis of the text is executed in sentence level. Thus, Textalytics first recognizes the polarities of certain sentences of the text and then, with these polarities, it determines the text global sentiment. The polarities of sentiments varies from -1 to 1 and are classified by Textalytics as: P+ (very positive), P (positive), NEU (neutral), N (negative) and N+ (very negative). Additionally, Textalytics uses Natural Language Processing techniques (NLP) to discover the relationship between sentiments and entities found in the text. For example, Lemmatization reduces the word to inflected form, POS Tagging classifies the terms according to their morphological classes and Syntactic analysis represents the terms in a full syntax tree where the leaves are the most basic elements [57]. Analyses are done through HTTP or HTTPS requests using the GET or POST methods. The requests must specify the format of the expected return (XML or JSON) and an access key that is provided through registration on the site [50]. In the free plan, this key is valid for approximately one year and allows 500.000 credits per month (2
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
words equivalent to 1 credit) and supports up to 5 requests per second.
3.2. Proposed method and tool
The SentiHealth method, proposed in this work, uses information about the application context and internet users’ communication styles, e.g. hashtags and emoticons, to improve the classification performance. As a instance of SH method the tool SentiHealth-Cancer (SHC-pt) is launched. The tool contributes to automate the method and to evaluate it. The SH’s flowchart can be seen in Figure Appendix A and its processes are defined and exemplified in the following subsections.
3.2.1. Dictionary Set
The method SH uses the following text files to analyze the feelings of the messages: “dictionary.txt”, “emoticon.txt”, “hashtags.txt” and “ngrams.txt”. Each of these files has in each line, a term or set of terms and information about the terms separated by a blank space or a colon (”:”). The “dictionary.txt” is the same dictionary of terms in Portuguese used by SentiStrength and has 1964 terms. In each row, there is a word and the emotional strength of this word indicated by a numeral ranging from -5 to 5. The higher this number, the more positive is the sense of the word, and the smaller, the greater is its negativity. The “emoticon.txt” contains the same 125 textual emoticons and sentimental strength used by SentiStrength tool. An example of a positive emoticon is: “(: 1”, where “(:” is the emoticon and “1” is the positive emotional strength of this emoticon. The only
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
difference between this SentiStrength dictionary and ours is that we also considered the question mark as an emoticon with a sentimental strength equal to zero (“? 0”). The file “hashtags.txt” has 6 hashtags applied in cancer groups in social networks, such as “#obrigadodeus” and “#obrigadodoador”. The sentimental strength of these hashtags are separated by a blank space. The file “ngrams.txt” contains 86 n-grams and has four information in each line: an ngram, its sentimental strength, a number 1 or 0 indicating whether the n-gram is or is not priority and another number 1 or 0 indicating whether variations of this n-gram are considered or not. For example, the line “happy:4:1:0” indicates that the n-gram “happy” has a sentimental strength +4, it is priority and variations of it (happiness, unhappiness, happily) are considered in the SA. The fact of an n-gram being or not priority is used by the logic of SentiHealth to decide which sentences in a message shall be considered for the SA. This explained in details in Section 3.2.3. During SA, it is checked whether the terms or sets of terms of each of the lines in the files mentioned above are present in the text being analyzed. If so, the sentimental strength of this term is added to the overall sentiment score of the whole text. For example, the file “dictionary.txt” has a line like “good 5”. Supposing that in the text there is also the word “good”, if the sentiment score of the text is currently 0, its score turns into 5 due to the sentimental strength of the word “good”. Researchers of this work judged which should be the sentimental strength of these terms in the files “ngrams.txt” and “hashtags.txt”. For this, we considered the cancer context in which texts were inserted. The sentimental strengths of the terms of other files were kept the same used by SentiStrength.
3.2.2. URLs
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
Before analyzing a message, SHC-pt checks if there exists any URL on it. If so, all URLs are removed from the text. This is done because the URLs are formed by a set of characters without sentimental significance.
3.2.3. Priority Sentences
Priority sentences are defined as sentences that have some of the following characteristics: emoticon, hash- tag, exclamation point, capitalized word or some ngram established as a priority in the file “ngrams.txt”. We found that these specific features, when present in a sentence were able to represent the author’s mood by themselves. Thus, when a message has priority sentences, only these need to be considered for the analysis of the sentiment of the whole message. Consequently, we discard the non-priority sentences when there is at least one priority sentence in the message. Each priority sentence is classified separately as belonging to one of the following classes: positive, negative or neutral, as explained in the following sections. After that, the message is labeled with the present class in priority sentences considered. For example, if a message has two positive priority sentences, one neutral and one negative, the sentiment of this message is classified as positive. If there is a tie in the numbers of positive and negative priority sentences, the message is classified as indefinite. If at least one priority sentence is classified as positive or negative, the amount of neutral priority sentences in the message is considered zero. This is because when a person does not want to show positive nor negative sentiments in a message, he does not write any sentence with these sentiments, that is, he stays neutral from start to end of the
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
message. If there are no priority sentences, the entire message is used to calculate the sentiment and only the dictionaries “ngrams.txt” and “dictionary.txt” are used for SA. For that, each word is scored based on the scores found in these dictionaries and the message’s score corresponds to the sum of words’ score. Finally, the message sentiment will be classified as positive if the sum is greater than zero; negative if it is less than zero and neutral otherwise.
3.2.4. Emoticons and hashtags
To calculate the sentiment of each priority sentence it is checked if any emoticon in file “emoticons.txt” is contained in the sentence. If so, only these emoticons are considered to define the sense of the whole sentence. Moreover, SHC-pt follows a similar procedure with hashtags: if there is any hashtag in the sentence that also occurs in “hahstags.txt”, the other terms in the sentence are disregarded and only the hashtag is used to classify the sentence as positive, negative or neutral. If both emoticons and hashtags are present in the sentence only the emoticons are considered.
3.2.5. Question mark
Although interrogative sentences can be classified as positive, negative or neutral by summing the scores of its terms, these sentences usually express no sentiment. Thus, considering that when there is an emoticon in a sentence, only this is used in the sentiment classification, we added a question mark in the file “emoticons.txt” with
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
sentiment strength equal to zero so that interrogative sentences are considered as having neutral sentiment. If the text contains no emoticons or hashtags included in the files “emoticons.txt” and “hahstags.txt”, the files “dictionary.txt” and “ngrams.txt” are used to classify the whole sentence.
3.2.6. Exclamation mark, capitalized word and repeated vowels
If the sentence contains some exclamation mark the sentiment score is multiplied by 2. This same multiplication is performed each time an uppercase word or a word with repeated vowel is found. E.g., “LOVE”, or “Loooooove”.
3.2.7. N-gram
An n-gram is a set of terms of size n. Thus, a unigram (n-gram of size 1) is a set with a single term, bigram is a set of two terms, and so on [58]. If any unigram is contained in the file “ngrams.txt” and also exists in the file “dictionary.txt”, the priority will be given to sentimental strength of this unigram specified in the file “ngrams.txt”, for use in classification of the sentence sentiment. A term contained in the file “dictionary.txt” may be part of some n-gram with more than one term in the file “ngrams.txt”. This would imply in two contributions of the term in the score of the message: one due to the sentimental strength of term in “dictionary.txt” and the other due to the sentimental strength of the n-gram containing the term in “ngrams .txt”. To prevent this, after finding an n-gram, its sentimental
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
strength is used to calculate the sentiment score and then this n-gram is removed from the message, avoiding double scoring. Another caution is taken in relation to the order in which the n-grams are placed in the file “ngrams.txt”. A smaller n-gram may be part of a larger one. For example, “cancer” is part of “fight and win cancer”. Thus, in the sentence “I want to fight and win cancer” the algorithm can find first the n-gram “cancer”, consider its sentimental strength and remove it. In this case the remaining sentence becomes: “I want to fight and win”. This makes it impossible for the stronger n-gram “fight and win cancer” to be considered. For this reason, n-grams with a larger number of terms should be analyzed first. For that, the larger n-grams should be put before the smaller ones in the file “ngrams.txt”.
3.3. Outcome measures and evaluation criteria
To find the most appropriate tool to identify sentiments in texts written by cancer patients and, therefore, help them, the following metrics were used: precision, recall, F1measure and accuracy. These metrics are calculated comparing the classifications intended with the classifications made by the tools. The precision, recall and F1-measure of the tools are measured in each of three classes of sentiment used: positive, negative and neutral. Moreover, the accuracy considers the three classes together and is measured by quantifying the messages correctly classified. To calculate the performance of each tool in each sentiment class C (positive, negative or neutral), we used the definitions of precision:
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
Definition 1. Precision(C ) = 100 ∗
+
+
30
| Inst class (C ) ∩ Inst real (C ) | , | Inst class (C + ) ∪ Inst indef (C + ) |
where Instclass(C+) is the set of instances (messages) classified by the tool as belonging to class C, Instreal(C+) is the set of instances that should be classified as belonging
to
the
class
C
according to the classifications made previously by
researchers and Instindef (C+) is the set of messages classified as indefinite, that is, messages whose sentiments could not be defined by the tool. In the case of SHC-pt, indefinite messages have most of their sentences classified as both positive and negative. On the other hand, recall is calculated according to the following definition:
| Inst class (C + ) ∩ Inst real (C + ) | Definition 2. Recall (C ) = 100 ∗ , | Inst real (C + ) | Recall measures how much a tool is able to correctly classify the total messages of a given class. On the other hand, precision, measures how many messages classified by a given tool as belonging to a class are really from that class. Thus, a tool with a very high recall tends to have a low precision and vice-versa. For this reason, a harmonic average between these two metrics is more appropriate to compare tools. This combination of recall and precision corresponds to the F1-measure defined as: Definition 3. F1(C ) =
2 ∗ Precision(C ) ∗ Recall (C ) , Precision(C ) + Recall (C )
To calculate the accuracy of each tool T, considering the three sentiment classes, we used the following definition: Definition 4. Accuracy (T ) = 100 ∗
| Inst hit (T + ) | , | Inst total (T + ) |
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
where Insthit(T +) is the set of all messages classified correctly by the tool T (according to the researchers’ opinions in the three sentiment classes) and Insttotal(T +) is the set of all evaluated by the tool. Also, we measured the simple arithmetic average of precision, recall and F1measure considering the positive, negative and neutral classes. With this, it is possible to determine the most appropriate tool when it is desirable to identify texts in these three classes.
3.4. Methods for data acquisition and measurement
3.4.1. Data acquisition
We used the API Facebook4j [59] to collect the posts from Facebook. It is a Java library that enables developers to use the Facebook API, named Graph API. The collected data were stored in a table in a remote database server using the DBMS (Database Management System) PostgreSQL 8.3 [60]. In this table, each stored message has the following information: its real sentiment classification (coming from the opinion of three researchers of this study), its classification given by the tool, its translation into English and the identification of the Facebook group it came from. We selected two groups of cancer from Facebook that have a lot of users. The first group (group A) is a community founded in 2012 and contains 28,215 participants. The second group (group B) contains 803 participants and was created in early 2013. Facebook’s posts may have three text fields: the author’s message and, if it is a share, title and description of what is being shared. A criterion adopted by us for data collection
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
was to only collect the author’s message. This was adopted because the sentiment contained in the text of what is being shared (title and description) is not always consistent with the poster sentiment. For example, a person can share a seemingly good news, but write an indignation message in sharing of this news. We collected 100 posts of each chosen group.
3.4.2. Data measurement
In what follows, we present the variations of the tools used in our experiments. First, we present the abbreviations of the tools, when using their support to the Portuguese language: 1.
SHC-pt: Portuguese texts analysis using SentiHealth-Cancer.
2.
SEM-pt: Portuguese texts analysis using Semantria.
3.
SST-pt-wc: Portuguese texts analysis using the SentiStrength, considering
the term “cancer” in the dicionary of this tool. 4.
SST-pt-woc: Portuguese texts analysis using the SentiStrength not
considering the term “cancer” in the dicionary of this tool. Next, we list the tools using support to the English language only: 1. ACM-en: English text analysis using the AlchemyAPI; 2. SEM-en: English text analysis using the Semantria; 3. SST-en-wc: English text analysis using the SentiStrength considering the term “cancer” in the dictionary of this tool; 4. SST-en-woc: English text analysis using the SentiStrength ignoring the term “cancer” in the dictionary of this tool.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
5. TAY-en: English text analysis using the Textalytics.
We also compared the results of the best tools that support Portuguese language with the results of the tools that do not have support to Portuguese or which also accept English as input language. To this end, we translated the messages to English and used the English-recognition option of the tools compared to perform SA in the translated messages. The objective of this comparison was to know how good tools not supporting Portuguese language could perform when processing translated messages. We repeated each of the experiments changing the amount and origin of the analyzed messages. Therefore, the databases used are: 1. One hundred (100) messages from group A; 2. Ninety (90) messages from group B; 3. Fifty (50) random messages from group A; 4. Fifty (50) random messages from group B; 5. Fifty (50) random messages from group A and group B; 6. One hundred and ninety (190) messages from group A and group B; The SentiStrength is unique among existing tools in the sense that it allows for changing its dictionary used for text classification. Thus, because we considered a context where there are several sentences with the word “cancer”, we executed an experiment keeping this word and its derivatives in SentiStrength’s dictionary and also an experiment where we remove them. The reason for these experiments is that SentiStrength and all the other existing tools compared are general purpose SA tools and in most dictionaries of these tools the word “cancer” has a negative polarity. This implies that when used in messages about cancer groups the many occurrences of this word in a message tend to score the whole message with a negative sentiment.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
However, we noticed that in most messages in the context considered the word “cancer” and its derivatives are neutral. Thus, we wanted to investigate the influence of this nuance of the use of a general-purpose dictionary for the SA of messages from the specific context of cancer patient groups. Each message collected was classified by the three authors of this study as positive, negative or neutral. The message classification is defined by the most opinion given by these researchers. For example, if two researchers classify a message as positive and one as neutral, the message is classified as positive. Messages with draws on the researchers’ opinions were not considered in the experiments. In group A there were no draws, on the other hand, in group B there were ten draws. This resulted in disregarding ten of the one hundred collected messages from this group, leaving only ninety messages from group B to be used in experiments. As shown in Figure 1, considering the researchers’ opinions, of the 100 posts collected from group A, 54 posts are positive, 26 posts are neutral and 20 posts are negative. In group B, 45 posts are positive, 35 posts are neutral, 10 posts are negative and there were 10 draws. This shows there is a mix of sentiments in the collected posts. The Semantria and SentiStrength tools support Portuguese and English languages, so they were tested with both languages. Otherwise, AlchemyAPI and the Textalytics support the English language, but not support Portuguese, so they were tested only with English. The tool proposed in this work, SHC-pt, is only tested with Portuguese because it is specific to this language.
To translate Portuguese texts into English we used the API in Java Bing Translator [61]. Because of the automatic translations, many words could not be translated correctly, which generated noise in the texts to be analyzed. This can negatively
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
influence the tools that support English language. For this reason, in this article, we separated the comparisons among tools that supporting Portuguese language from those using English language. Each API tool assigns a different score to the terms to classify the text as positive, negative or neutral. AlchmeyAPI and Textalytics score a sentence in the interval of -1 to 1, Semantria scores from -2 to 2 and the SHC-pt from -6 to 4. SentiStrength gives two scores to text: a negative score ranging from -1 to -5 and another positive score ranging from 1 to 5 [53]. The sum of these two scores is the value that defines the class of the text. AlchemyAPI, SentiStrength, SHC-pt and Textalytics-Cancer classify a text as neutral if the sentiment’s score obtained is zero, positive if the score value is above zero and negative otherwise. Distinctively, Semantria classifies a text as neutral if the score given is between -0.45 and 0.5. Above this range the text is classified as positive and below, as negative [55]. To analyze the results of the experiments executed with the tools covered in this study we developed an application in a Java that integrates all these tools. We executed the proposed experiments and results are presented in the next section.
4. Results and output data of the study
In this section we present the results obtained by applying the variations of the tools on the different collections defined in Subsection 3.4.2. The experiments executed consider the language in which the messages are written, the amount of texts analyzed, the origin of messages and changes in the dictionaries used by the tools.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
4.1. All messages from group A
Table 1 shows the results of the experiments using the 100 posts of group A with tools that support texts written in Portuguese. Table 2 shows similar the results of a similar experiment, but this time using the posts translated to English. The SHC-pt showed the best precision in all classes, however, the SST-pt-wc showed best recall for negative texts (85%) and SEM-pt presented the best recall value in neutral texts (80.76%). In this case, the F1 measure is useful to find out which tool provides best harmonic mean between recall and precision. In the experiment of Table 1, the SHC-pt presented the best F1 values in all classes (79% in Positive, 42.1% in Negative and 58.33% in Neutral class), indicating that it is the best to classify, not only negative, but also positive and neutral texts. Table 2 shows that SHC-pt outperforms all tools that support the English language on F1 average (59.82%), followed by ACM-en (40.43%) and SEM-en (37.46%). This shows that it is better to use the SHC-pt to analyze texts in Portuguese than trying to translate them into English and use of the existing tools.
Considering the performance obtained in these two experiments, we can see that despite Semantria and SentiStrengh support the Portuguese language to classify messages, their results were not as good as the classifications made in texts translated to English. This may be because their Portuguese lexicals are not as robust as their English lexicals. Finally, the absence of the word “cancer” in the SentiStrength’s dictionary had no significant effect.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
Analyzing the unknown classification, the tools SEM-en, SEM-pt, ACM-en, SHC-pt and TAY-en classified respectively 5, 4, 1, 5 and 14 posts as indefinite.
4.2. All messages from group B
We repeated the experiments reported in last section, but this time using the posts from group B. Tables 3 and 4 show the results of experiments considering all 90 posts collected from group B.
Table 3 shows that SHC-pt outperformed all the other tools that also have support to the Portuguese language. Either in accuracy or F1, SHC-pt was better in average. Analyzing only the F1 measure, the average of SHC-pt (48.66%) is 29.10% better than the second best tool in this measure (SST-pt-woc with 37.69%). Considering accuracy, the average of SHC-pt (60%) is 46% better than the second best SST-pt-woc (41.11%).
As can be seen in Table 4, SHC-pt outperformed all addressed tools either on F1 or accuracy. Analyzing the F1 measure, the average of SHC-pt (48.66%) is 34.75% better than the second best tool (SEM-en, with 36.11%). The accuracy of SHC-pt (60%) is 31.72% better than that of the second best tool in this measure (ACM-en, with 45.55%). This shows that, also in group B, SHC-pt is better to analyze Portuguese texts than other tools using text translated to English. Furthermore, it is possible to see the low values of precision and recall of Textalytics (TAY-en) to analyze negative text translated to English, reaching only 5.4% in F1. Finally, the ACM-en reached a good result on negative
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
class, reaching 50% in recall and 21.73% in precision. Analyzing the unknown classification, the tools ACM-en, SHC-pt, SEM-en, SEM-pt and TAY-en classified respectively 2, 8, 4, 9 and 16 posts as indefinite.
4.3. Random messages from group A
Table 5 and 6 show the results of experiments considering 50 random posts collected from group A. Among them, 26 posts are positive, 12 posts are negative and 12 posts are neutral.
Table 5 shows that the SHC-pt also outperformed all the other experimented tools in terms of F1 values in this collection. Its average on F1 (63.79%) was 37.09 better than the second best average SST-pt-woc (46.53%). The SHC-pt also showed the best results in F1 considering each sentiment class (78.68% in positive, 57.14% in negative and 55.55% in neutral class), it loses only for the SST-pt-wc in precision in the positive class (71.42%) and on Re in negative class (91.66%). A factor that have contributed to the gains achieved by the SHC-pt is the consideration of hashtags of cancer groups on Facebook. The tools SST-pt-wc and SEM-pt did not consider hashtags as “#obrigadodoador” (“thankdonor#” in English) and “#medulapegalogo” (“marrowworknow#” in English) and did not classified correctly messages with these hashtags, whereas SHC-pt classified.
As can be seen in Table 4, Textalytics (TAY-en) is not a good tool to analyze
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
negative texts in the context considered. Moreover, we can see in Table 6 that it is not also recommended to analyze neutral texts because it reached only 7.4% on F1 in this class. However, to analyze positive texts, it showed the second best F1 (61.53%), losing only to SHC-pt (78.68%). This indicates a distortion of TAY-en, tending to label most of messages as belonging to the positive class. Analyzing the unknown classification, the tools SEM-en, SEM-pt, ACM-en, TAY-en and SHC-pt classified respectively 6, 4, 1 and 6 posts as indefinite.
4.4. Random messages from group B
Tables 7 and 8 show the results of experiments considering 50 random posts from group B. Among them, 20 posts are positive, 7 posts are negative and 23 posts are neutral.
Table 7 shows that the SEM-pt obtained better values of the F1 measure than SHC-pt with neutral texts. But, analyzing the all averages, the SHC-pt outperformed all considered tools. Either in accuracy or in F1. Analyzing only the F1 measure, the average of SHCpt (48.66%) is 21.80% better than the second best (SEM-pt, with 39.95%). In terms of accuracy, the average of SHC-pt (58%) is 11.53% better than the second best tool (SEM-pt, with 52%).
Also in this collection of posts, SHC-pt performed better than the other tools in terms of F1 values when they analyze posts translated to English, as shown in Table 8. Considering only the F1 values, the average of SHC-pt (48.46%) is 38.10% better than
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
SEM-en (35.09%), which is the second best method regarding this measure. In terms of accuracy, SHC-pt (58%) was 31.81% better than ACM-en (44%) which presented the second best value of this measure. Analyzing the unknown classification, the tools SHC-pt, SEM-pt, SEM-en, ACM-en, and TAY-en classified respectively 5, 2, 7, 2 and 9 posts as indefinite in this collection of posts.
4.5. Random messages from groups A and B
Table 9 and 10 show the results of experiments considering posts randomly chosen from the union of groups A and B, totaling 50 posts. Among them, 29 posts are positive, 5 posts are negative and 16 posts are neutral.
Table 9 shows that SHC-pt performs much better than Semantria (SEM-pt) in terms of recall, but loses to it in terms of precision when classifying positive posts. However, SHC-pt wins are greater than its losses and it ends up showing better F1 values in the positive class. In Table 9 we also see that the SEM-pt presents better F1 value than that of SHC-pt when considering negative (40%) and neutral (57.69%) texts in Portuguese. However, SHC-pt is the best in average in terms of both accuracy (64%) and F1 (52.63%), considering the three classes of sentiment.
As shown in Table 10 SEM-en failed to classify any negative message and the ACMen failed to classify any neutral message. This may be due to the low amount of text
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
used, only 50 posts. As occurs in previous experiments, SHC-pt reached the best average F1 in all other classes (72.13% in Positive, 37.5% in Negative and 48.27% in Neutral class). When experiments were executed with isolated groups, the SHC-pt achieved the bests results of the, and adding this last results, we conclude that SHCpt tool can well analyze posts related to the cancer theme, not being specific to one group only. Analyzing the unknown classification, the tools SHC-pt, SEM-en, ACM-en, and TAYen classified respectively 3, 2, 1 and 8 posts as indefinite.
4.6. All messages
Table 12 shows the results of experiments considering all collected posts, which had no draws in classification of researchers. This resulted in a total of 190 posts joining the groups A and B.
With a larger number of texts being used in the experiment reported in Table 11, we can see, by regarding accuracy, that the most recommended tool to analyze texts in Portuguese is the SHC-pt (65.78%), followed by SentiStrength - SST-pt-woc (41%) and Semantria (40%), respectively. Is worth mentioning that SHC-pt presented the best F1 in all sentiment classes (71.74% in positive, 34.78% in negative and 57.89% in neutral classes). In Table 12 we also can see that SCH-pt shows more satisfactory results, even considering a larger amount of data (190 messages), when compared to the tools that analyzed translated texts. Analyzing the unknown classification, the tools SHC-pt, ACM-en, SEM-en, SEM-pt and TAY-en classified respectively 13, 3, 12, 1 and 30 posts as indefinite.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
The previous reported experiments lead us to conclude that SHC-pt obtained better accuracy and better average on precision, recall and F1 to classify texts into positive, negative or neutral classes. Experiments reported in Tables 11 and 12 show that also using a larger database of various Facebook’s groups related to the cancer topic, SHCpt presents a high degree of reliability in SA. In addition, experiments using tools that support the English language did not obtain satisfactory results when compared to SHC-pt. This shows that it is better to use SHC-pt instead of translating texts to English to be used with those tools. As a negative highlight, Textalytics was the tool which most classified messages as indefinite and presented the worst results in the conducted experiments. It is important to remember that the recognition of negative feelings is important for a clinical intervention with the purpose of helping people. We showed that SHC-pt only did not reach good F1 values with negative messages in collections where the number of negative messages is too small (group B - 10 negative posts and random sample from groups A and B - 5 negative messages). However, it is important to highlight that whenever there is few negative messages there are less chance of occurring the hashtags and emoticons present in SHC-pt files that help to identify the negative class. In all the other collections the number of negative messages was sufficient for SHC-pt to detect them. In future research we intend to analyze more different groups data. A good SA tool in the clinical setting should not be good to identify just one sentiment, it is necessary that it presents a good average behavior in the three classes of sentiment so that it can help to evaluate the variations in sentiment of the individuals being analyzed. For example, the detection of many negative messages from a patient who was optimistic is an indicator of negative change in his mood that may require attention from medical assistants.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
4.7. Unexpected events
During development of the method some unexpected events occurred, but all of them were solved. These events were not considered in the planning and were found during data acquisition and development phases of the proposed method. The first unexpected event occurred in the selection of data that were used in the experiments. In the project, initially, we had to forecast that the groups related to cancer would have only data that would be important for research, but we found many shared posts that had no author’s messages, only shared content. The solution in this case was to conduct a search on Facebook for groups that had lots of posts with author’s messages on them. The consequence of this was a slight delay in data collection. The second unexpected event was the low efficiency of the existing tools when experiments were executed with data about cancer as theme. The reason, concluded after several experiments, is that these softwares have been developed for more general texts and not specifically for cancer context. Furthermore, these tools analyze the author’s sentiment about a target which it is distinct from the author himself. This event served as an inspiration to further improve the effectiveness of the method developed for the SA of people involved with cancer. A third unexpected event was the limits per user imposed by Semantria and Bing Translator’s API. To work around these unexpected events, we made another register on the site of each of these tools using another e-mail account.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
5. Discussion 5.1. Answers to study questions
In the Introduction of this article we presented three main research questions that we aimed to answer. The first one is about the effectiveness of existing SA tools AlchemyAPI, Semantria, SentiStrength and Textalytics for performing SA on posts from Brazilian groups in Facebook related to cancer. Tests were executed with these tools that recognize only the English language and also using the English recognizing options of the above tools that work with both English and Portuguese languages. In these cases, the Portuguese texts were translated to English using automatic tools. However, the effectiveness of the tools using the translated texts was low in most of the cases, as can be seen from Table 13. This shows that the use of translated text in the context of cancer posts written in Portuguese is not effective. On the other hand, the support to Portuguese language existing in Semantria and Sentigrenth allows for obtaining better results with those tools. Also, the tool proposed in this work shows to be the most appropriate to analyze sentiments in Portuguese texts in cancer domain. The worst accuracy (58%) achieved by our proposal is greater than the greatest accuracy (52%) presented by the other addressed tools. Table 13 summarizes the results of all experiments that were conducted.
The second research question is if the use of specific information present in posts of cancer communities could help to improve the effectiveness of the lexical approach in SA to detect messages with positive, negative or neutral sentiments. We found that dictionaries specific tailored to emphasize the sentiment strength of some textual
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
components were shown to enhance considerably the performance of SA in the specific context of posts from cancer communities of Facebook. Specifically, we give special treatment to sentences containing emoticons and some editing resources used to emphasize words, like repeated vowels and capitalized words. Additionally, we make use of hashtags which are resources commonly used in posts of cancer communities. We also included the n-gram file which allowed us to include words and group of words which have great importance in determining the polarity of sentiments in the context considered. These components when present in a sentence make it so determinant to resolve the polarity of the whole post that we discard the other sentences not containing them. Sentences with these components were referred to as priority sentences in this work. In order to better show the contribution of the above resources in the performance of SHC-pt, we performed additional experiments. We found that using only priority sentences we achieved an increase of 8.68% in accuracy in SA of post of group A and an increase of 3.75% in accuracy in group B than when we considered the whole message. Also, using only priority sentences in group A, SHC-pt was able to reach an accuracy of 62.31%, while SEM-pt, SST-pt-woc and SST-pt-wc, reached, respectively, the following values of accuracy: 10.14%, 30.43% and 36.23%. In group B, SHC-pt reached an accuracy of 71.25% using only priority sentences, while SEM-pt, SST-pt-woc and SST-ptwc, reached, respectively, accuracy values of 7.5%, 35% and 43.75%, respectively. Also when no priority sentence is present, SHC-pt achieved good results of accuracy when compared to other methods: 76.19% in group A and 81.01% in group B, while SEM-pt, SST-pt-woc and SST-pt-wc reached, respectively, 28.57%, 57.14% and 57.14% in group A and 29.77%, 31.65% and 24. 07%, in group B. These achievements were due to the use of unigrams in file “n-gram.txt” with scores customized to the cancer context. Thus
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
we can conclude that customization of dictionaries o lexical components were determinant to the performance of SHC-pt. The last research question is about the influence of the origin of the posts to the effectiveness of the SA, that is, if posts from different cancer groups in Facebook could influence the effectiveness of SA. To answer this question we built two group of posts (group A and group B), each coming from a different Facebook group. We experimented the distinct tools with these two groups. We also generated other three collection from these groups: two of them were formed by obtaining a random sample of the two groups, another formed by a random sample from the union of the two groups and a collection formed by the complete union of the two groups. We did not found great differences in the performance of the methods in the various collections tested, except when the sampled collection had a small number of posts of a specific class. In this case, our method was specially affected because with few posts in a class the chance that these posts contain some components of our hashtag and n-gram files is reduced.
5.2. Strengths and weaknesses of the study
One of main benefits of the proposed tool is its good ability to analyze the sentiment texts of cancer domain. As we showed in the last section, the customization techniques we used in the tool were of fundamental importance to improve this ability. This achievements are useful in the treatments of people with cancer because help on the improvements of their quality of life. Another strong point in this study is the use of social network Facebook to collect texts written by people involved with cancer. These texts were collected without any disturbance of their authors. Furthermore, we expect a higher confidence of information
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
since posts are spontaneous manifest of the patients. Finally, the texts were already in digital media and the method used to collect texts was faster than others, such as ethnographic observations and interviews. A weakness of the proposed method is the lack of support for slang, incorrect texts, irony, sarcasm and others forms of expression. We are studying a process to address this issue to improve the method scope and, therewith, improve its performance. In addition, the work of this article has the following risks: the classification of texts made by the authors to compare the classifications made by the tools may be wrong, the groups chosen to collect the texts may not be representative, some of the tools used for experiments may have been applied wrongly.
5.3. Meaning and generalizability of the study
This research has a meaning to the field of informatics, where we developed a new method to analyze sentiments of Portuguese web texts with a specific theme and considering as the target of sentiment the author of the analyzed message. The method is a kind of heuristic and uses lexical references to achieve the objectives. Features adopted in this method as well as their execution order can be used in future research. In particular, the customization techniques we used can be easily applied to other group of diseases existing in Facebook and even in other groups of posts where we can benefit from lexical nuances present in the subject of these groups. For health, this article added values in the evaluation sector of people during treatment of cancer, since the proposed tool is specifically for this group of diseases. Psychologists, doctors and social workers can benefit their decisions from monitoring the emotional state of patients and families.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
5.4. Unanswered and new questions
The Information presented in this work to analyze the sentiment of a person is useful to track how a person is feeling during cancer treatment. Therefore, as a future extension of this work it is possible to propose recommendations that seek to improve the emotional state of this person. These recommendations may be, for example, new posts, friendship, or even a specific song. For instance, the work in [46] identifies which interactions among members in a forum about cancer contribute to move members sentiment in a positive direction. This work considered only the cancer disease context and Facebook as the online social network. Thus, it is possible to also extend this research considering other chronic diseases (SentiHealth-HIV, SentiHealth-Stroke, SentiHealth-Autism and SentiHealthSclerosis) and other social networks. When more than one social network are considered, it is possible to collect a larger amount of data, since a person may prefer to express his sentiments more in a specific social network than in others. In addition, our method was specific to the Portuguese language. Future work can extend this method to other languages as well. As future work, there is also the possibility to analyze how the user behavior on online social network can help to define the user’s sentiments. For example, likes in posts or profiles and a person’s friends in the online social network can help to discover more precisely what is the real sentiment of this person. In this work we classified sentiments just as positive, negative and neutral. It would be interesting to extend this work to consider more categories of sentiments. This makes possible the SA to establish multilevel sentiments that describe more clearly personal emotions.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
6. Conclusion
Existing tools for SA have a very low accuracy when used in web texts of the cancer context, written in Portuguese language. To solve this problem, we developed a new tool (SHC-pt) for SA at the sentence level using a lexicon and heuristics to analyze people’s texts involved with cancer. These texts were collected from posts from Facebook’s Brazilian cancer groups. Unlike other tools [6, 7, 8], the method proposed in this work considers the author of messages as the target of analysis. Many experiments were done with two online social groups from Facebook. The results show that SHC-pt outperformed all other tools addressed in all experiments. This was possible because we developed, unlike other tools, a method that uses a specific lexicon for cancer domain, which considers Portuguese texts from the social web. This lexicon considers terms, semantic emoticons, hashtags and n-grams that contributed to the good result of SHC-pt. Lastly, we conclude that the proposed method is a promising tool for sentiment analysis and thus can contribute to SA field as a context-based approach. Many future works, including those suggested in Section 5.4, can reference this work due to its novel approach and its focus on the author as the target of the sentiment.
Author contributions Celso Camilo-Junior coordinated the research of this study, advising the researchers to use more effective techniques in the applied context. Furthermore, Celso Camilo, along with Thierson Couto and Ramon Gouveia, conducted the analyses of the data
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
collected to verify the accuracy of the method developed and helped to classify texts into positive, negative and neutral. Finally, he helped Ramon Gouveia in the creation and definition of the method and the sentimental strengths of the dictionaries used by SHCpt. Ramon Gouveia worked directly in data collection and development of the method, as well as its implementation, its workflows and also in the integration processes to create the method discussed in this article. He was assisted by Celso Camilo-Junior, Thierson Couto and Rafael Marques in this study with opinions and new ideas for the development of the new proposed method. Rafael Marques contributed with his prior knowledge on similar tools of SA and with the production of the documentation of the research project, evaluation and construction of this article.
Competing interests The authors declare no conflicts of interest.
Acknowledgements Thanks for the CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Higher Education Personnel Training Coordination) for providing scholarship to this project.
Author contributions The Prof. Dr. Celso Camilo coordinated the research of this study, advising the researchers to use more effective techniques in the applied context. Furthermore, Prof. Dr. Celso Camilo, along with Prof. Dr. Thierson Couto and Ramon Gouveia, conducted
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
the analyses of the data collected to verify the accuracy of the method developed and helped to classify texts into positive, negative and neutral. Finally, he helped Ramon Gouveia in the creation and definition of the method and the sentimental strengths of the dictionaries used by SentiHealth-Cancer. The Ramon Gouveia worked directly in data collection and development of the method, as well as their algorithms, its workflows and also in the integration processes to create the method discussed in this article. He was assisted by Prof. Dr. Celso Camilo, Prof. Dr. Thierson Couto and Rafael Marques in this study with opinions and new ideas for the development of the new proposed method. Rafael Marques contributed with his prior knowledge on similar tools of SA and in the production of documentation of research project, evaluation and construction of this article.
Summary Points What was already known about this study:
• Already existing tools that are capable of making SA in texts from online social
network. • The existing methods of SA do not consider the author himself as the target of the
analysis. • The sentiment analysis can help people who are treating cancer
and their families. What this study has added to our knowledge: • The SA, considering the author himself as the target of analysis, can help to find
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
30
out if he is with a positive or negative thinking. • Online social network could be a good option to obtain many and trust information
from patients and their families. Moreover, a collect process could be accelerated since the information already exist and can be downloaded by Social Network’s API very quickly. • A new method, with good results, to classify the sentiments of the involved people in
the health context, based on data collected from online social network. • Social Worker and physiological can monitor the patients’ families and help them too. • This study allows health professionals or others patients assistants to gain a better
monitoring of the emotional state of patients with cancer. Table 14. Summary Points.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
Appendix A. Flowchart of algorithm of SentiHealth method.
Figure A.2. Flowchart of algorithm of SentiHealth method.
31
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
References [1] T. Ramani, M. Begam, Survey: A techniques implemented on opinion mining, International Journal of Computer Science and Engineering Technology 5 (2014) 965–968. URL [2]
http://www.ijcset.com/docs/IJCSET14-05-10-007.pdf
R. Tejwani, Sentiment Analysis: A Survey, Cornell University Library (2014)
1–
3arXiv:arXiv:1405.2584v1. [3] J. Serrano-Guerrero, J. A. Olivas, F. P. Romeroa, E. Herrera-Viedma, Sentiment analysis: A review and comparative analysis of web services, Information Sciences 311 (2015) 18–38.
doi:10.1016/j.ins.2015.03.040.
URL http://linkinghub.elsevier.com/retrieve/pii/S0020025515002054 [4] S. K. Yadav, Sentiment Analysis and Classification: A Survey, International Journal of Advance Research in Computer Science and Management Studies 03 (2015) 113–121. [5] B. Liu, Sentiment Analysis and Opinion Mining, Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, 2012, Ch. 1, pp. 4–17. doi:10.2200/S00416ED1V01Y201204HLT016. URL
http://dx.doi.org/10.2200/S00416ED1V01Y201204HLT016
[6] M. Cieliebak, O. Du¨rr, F. Uzdilli, Potential and limitations of commercial sentiment detection tools, in: Proceedings of the First International Workshop on Emotion and Sentiment in Social and Expressive Media: approaches and perspectives from AI (ESSEM 2013) A workshop of the XIII International Conference of the Italian Association for Artificial Intelligence (AI*IA 2013), Turin, Italy, December 3, 2013., 2013, pp.
47–58.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
URL http://ceur-ws.org/Vol-1096/paper4.pdf [7] A. Abbasi, A. Hassan, M. Dhar, Benchmarking twitter sentiment analysis tools, in: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, May 26-31, 2014., 2014, pp. 823– 829. URL
http://www.lrec-conf.org/proceedings/lrec2014/summaries/483.html
[8] D. Y. Turdakov, N. A. Astrakhantsev, Y. R. Nedumov, A. A. Sysoev, I. A. Andrianov, V. D. Maiorov, D. Fedorenko, A. Korshunov, S. D. Kuznetsov, Texterra: A framework for text analysis., Programming
and
Computer
Software
40
(5)
(2014)
288–295.
doi:10.1134/S0361768814050090. [9] O. Corporation, Java, https://www.java.com/en/, accessed in November 2014 (2014). [10] A. D. I. Kramer, J. E. Guillory, J. T. Hancock, Experimental evidence of massivescale emotional contagion through social networks, Proceedings of the National Academy of Sciences of the United States of America 111 (24) (2014) 8788–8790. doi:10.1073/pnas.1320040111. URL
http://www.pnas.org/cgi/content/long/111/24/8788
[11] K. Portier, G. E. Greer, L. Rokach, N. Ofek, Y. Wang, P. Biyani, M. Yu, S. Banerjee, K. Zhao, P. Mitra, J. Yen, Understanding topics and sentiment in an online cancer survivor community, Journal of the National Cancer Institute − Monographs (47) (2013) 195–198. [12] P.-C. Sian, S.-H. Tan, A survey on quality of life and situational motivation
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
among
parents
35
of
children
with
autism spectrum disorder in malaysia,
International Conference on Sociality and Humanities 18 (2012) 89–94. [13] C. H. Kroenke, L. D. Kubzansky, E. S. Schernhammer, M. D. Holmes, I. Kawachi,
Social
Diagnosis,
Networks, Social Support, and Survival After Breast Cancer
Journal
of
Clinical
Oncology
24
(7)
(2006)
1105–1111.
Aubeeluck,
Social
support
doi:10.1200/JCO.2005.04.2846. [14] N.
S.
cyberspace:
Coulson, a
content
H.
Buchanan,
analysis
A.
in
of communication within a Huntington’s disease
online support group, Patient education and counseling 68 (2) (2007) 173–178. doi:10.1016/j.pec.2007.06.002. URL
http://www.sciencedirect.com/science/article/pii/S0738399107002261
[15] K. O. Hwang, A. J. Ottenbacher, A. P. Green, M. R. Cannon-Diehl, O. Richardson, E. V. Bernstam, E. J. Thomas, Social support in an Internet weight loss community, International Journal of Medical Informatics 79 (1) (2010) 5–13. doi:10.1016/j.ijmedinf.2009.10.003. [16] J. S. M. Rodrigues, N. M. L. A. Ferreira, Structure and functionality of the social support network for adults with cancer, Acta Paulista de Enfermagem 25 (5) (2012) 781–787. URL
http://www.scielo.br/scielo.php?script=sci
arttext&pid=S0103-
21002012000500021&nrm=iso [17] G. M. Turner-McGrievy, D. F. Tate, Weight loss social support in 140 characters or less: use of an online social network in a remotely delivered weight loss intervention, Translational Behavioral Medicine 3 (3) (2013) 287–294. doi:10.1007/s13142- 0120183-y.
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
URL http://hdl.handle.net/2142/14927 [18] S. Ashida, A. E. L. Palmquist, K. Basen-Engquist, S. E. Singletary, L. M. Koehly, Changes in Female Support Network Systems and Adaptation After Breast Cancer Diagnosis: Differences Between Older and Younger Patients, The Gerontologist 49 (4) (2009) 549–559. doi:10.1093/geront/gnp048. URL
http://www.pnas.org/cgi/content/long/111/24/8788
[19] S. H. M. Roffeei, N. Abdullah, S. K. R. Basar, Seeking facebook
for
Journal
of
children Medical
social
support on
with autism spectrum disorders (asds), International Informatics
84
(5)
(2015)
375–385.
doi:http://dx.doi.org/10.1016/j.ijmedinf.2015.01.015. URL
http://www.sciencedirect.com/science/article/pii/S1386505615000313
[20] F. L. Cruz, Feature-based opinion extraction: A practical, domain-adaptable approach, AI Commun. 25 (4) (2012) 369– 371. doi:10.3233/AIC-2012-0519. URL http://dx.doi.org/10.3233/AIC-2012-0519 [21] S. Liu, R. Law, J. Rong, G. Li, J. Hall, Analyzing changes in hotel customers’ expectations by trip mode, International Journal of Hospitality Management 34 (2013) 359–371. URL
doi:10.1016/j.ijhm.2012.11.011.
http://dx.doi.org/10.1016/j.ijhm.2012.11.011
[22] H. Cho, S. Kim, J. Lee, J. S. Lee, Data-driven integration of multiple sentiment dictionaries
for
lexicon-based
sentiment classification
Knowledge-Based Systems 71 (2014) 61–71.
of
product
reviews,
doi:10.1016/j.knosys.2014.06.001.
URL http://dx.doi.org/10.1016/j.knosys.2014.06.001 [23] A. K. Samha, Y. Li, J. Zhang, Aspect-Based Opinion Extraction from Customer
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
reviews, School of Electrical Engineering and Computer Science abs/1404.1982 (2014)
149–160.
URL http://arxiv.org/abs/1404.1982 [24] P. B. Somabhai, C. Science, A Survey on Feature Based Opinion Mining For Tourism Industry, Journal of Engineering Computers and Applied Sciences 4 (3) (2015)
83–86.
[25] A. Akay, A. Dragomir, B.-E. Erlandsson, Network-based modeling and intelligent data mining of social media for improving care, IEEE Journal Biomedical and Health Informatics 19 (1) (2015) 210–218. [26] F. L. Cruz, J. a. Troyano, F. Enríquez, F. J. Ortega, C. G. Vallejo, ‘Long autonomy or long delay?’ the importance of domain in opinion mining, Expert Systems
with
Applications
40
(2013)
3174–3184.
doi:10.1016/j.eswa.2012.12.031. [27] A. Bagheri, M. Saraee, F. De Jong, Care more about customers: Unsupervised domain-independent aspect detection for sentiment analysis of customer reviews, Knowledge-Based
Systems
52
(2013)
201–213.
doi:10.1016/j.knosys.2013.08.011. URL http://dx.doi.org/10.1016/j.knosys.2013.08.011 [28] F. Fang, K. Dutta, A. Datta, Domain Adaptation for Sentiment Classification in Light of Multiple Sources Domain Adaptation for Sentiment Classification in Light of Multiple Sources, INFORMS Journal on Computing 26 (2014) 586– 598. [29]
A.
C.
Society,
What
http://www.cancer.org/cancer/cancerbasics/what-is-cancer/, November 2014 (2014).
is
Cancer, accessed
in
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
[30] W. H. O. W. I. A. for Research on Cancer IARC, GLOBACAN 2012: Estimated Cancer Incidence, Mortality and Prevalence Worldwide in 2012, http://globocan.iarc.fr/Default.aspx, accessed in November 2014 (2012). [31]
B.
N.
C.
I.
INCA,
Brazilian
national
cancer
institute,
http://www.inca.gov.br/english/, accessed in November 2014 (2014). [32] J. E. Bower, Behavioral symptoms in patients with breast cancer and survivors, Journal of Clinical Oncology 26 (2008) 768–777. doi:10.1200/JCO.2007.14.3248. [33] M. Walji, S. Sagaram, F. Meric-Bernstam, C. W. Johnson, E. Searching
for
cancer-related
V.
Bernstam,
information online: Unintended retrieval of
complementary and alternative medicine information, International Journal of Medical Informatics 74 (7-8) (2005) 685–693. doi:10.1016/j.ijmedinf.2005.01.001. [34] R. E. Rice, Influences, usage, and outcomes of Internet health information searching: Multivariate results from the Pew surveys, International Journal of Medical Informatics 75 (1) (2006) 8–28. doi:10.1016/j.ijmedinf.2005.07.032. [35] M. Lemire, G. Paré, C. Sicotte, C. Harvey, Determinants of Internet use as a preferred source of information on personal health, International Journal of Medical Informatics 77 (11) (2008) 723–734. doi:10.1016/j.ijmedinf.2008.03.002. [36] J. L. Bender, M. C. Jimenez-Marroquin, A. R. Jadad, Seeking support on facebook: A content analysis of breast cancer groups, Journal of Medical Internet Research 13 (1) (2011) 01–11.
doi:10.2196/jmir.1560.
[37] K. M. AlGhamdi, N. A. Moussa, Internet use by the public to search for healthrelated information, International Journal of Medical Informatics 81 (6) (2012) 363– 373.
doi:10.1016/j.ijmedinf.2011.12.004.
[38] H. S. Wentzer, A. Bygholm, Narratives of empowerment and compliance: Studies of communication in online patient support groups, International Journal
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
of Medical Informatics 82 (12) (2013) e386–e394. [39] R. S. Valdez, P. F. Brennan, Exploring patients health information communication practices with social network members as a foundation for consumer health IT design, International Journal of Medical Informatics 84 (5) (2015) 363–374. doi:10.1016/j.ijmedinf.2015.01.014. URL http://linkinghub.elsevier.com/retrieve/pii/S1386505615000301 [40] S. E. Bedell, A. Agrawal, L. E. Petersen, A systematic critique of diabetes on the world wide web for patients and their physicians, International Journal of Medical Informatics 73 (9-10) (2004) 687–694.
doi:10.1016/j.ijmedinf.2004.04.011.
[41] A. Dey, B. Reid, R. Godding, A. Campbell, Perceptions and behavior of access of the Internet: A study of women attending a breast screening service in Sydney, Australia, International Journal of Medical Informatics 77 (1) (2008) 24–32. doi:10.1016/j.ijmedinf.2006.12.002. [42] J. F. Etter, Internet-based smoking cessation programs, International Journal of Medical Informatics 75 (1) (2006) 110–116. doi:10.1016/j.ijmedinf.2005.06.014. [43] B. S. Shenker, The accuracy of Internet search engines to predict diagnoses from symptoms can be assessed with a validated scoring system, International Journal of Medical Informatics 83 (2) (2014) 131–139. doi:10.1016/j.ijmedinf.2013.11.002. URL [44]
http://dx.doi.org/10.1016/j.ijmedinf.2013.11.002
Facebook, Stats, http://newsroom.fb.com/company-info/, accessed in April
2015 (2015). [45]
M. D. Wulf, Population Pyramids of the World from 1950 to 2100,
http://populationpyramid.net/world/2015/, accessed in April 2015 (2015). [46] B. Qiu, K. Zhao, P. Mitra, D. Wu, C. Caragea, J. Yen, G. Greer, K. Portier,
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Get online support, feel better – sentiment analysis and dynamics in an online cancer survivor community, in: Privacy, security, risk and trust (passat), 2011 ieee third international conference on and 2011 ieee third international conference
on
social
computing
(socialcom),
2011,
pp.
274–281.
doi:10.1109/PASSAT/SocialCom.2011.127. [47] SentiStrength, SentiStrength, http://sentistrength.wlv.ac.uk/, accessed in November 2014 (2014). [48]
Lexalytics,
Analysis
methods,
https://semantria.com/support/developer/docs/methods/, accessed in November 2014 (2014). [49]
AlchemyAPI,
Sentiment
Analysis
http://www.alchemyapi.com/api/sentiment-analysis/, accessed
API, in
November
2014 (2014). [50]
MeaningCloud,
What
is
Sentiment
Analysis?,
https://www.meaningcloud.com/developer/sentiment-analysis/doc/1.2, accessed in May 2015 (2015). [51]
AlchemyAPI,
Sentiment
Analysis
API,
http://www.alchemyapi.com/products/alchemylanguage/sentiment-analysis/, accessed in November 2014 (2014). [52] M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, A. Kappas, Sentiment strength detection in short informal text, Journal of the Association for Information Science and Technology 61 (12) (2010) 2544–2558. URL http://EconPapers.repec.org/RePEc:bla:jinfst:v:61:y:2010:i:12:p:2544-
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
2558 [53]
SentiStrength,
SentiStrength,
http://sentistrength.wlv.ac.uk/documentation/, accessed in November 2014 (2014).
[54]
Lexalytics,
Analysis
methods,
https://semantria.com/support/developer/docs/methods/, accessed in May 2015 (2015).
[55]
Lexalytics,
Semantria
https://semantria.com/support/resources/technology/,
Support,
accessed in November
2014 (2014). [56]
Lexalytics, Choose a plan, https://semantria.com/prices/, accessed in May
2015 (2015). [57]
MeaningCloud,
Lemmatization,
PoS
and
Parsing,
https://www.meaningcloud.com/developer/lemmatization-pos-parsing, accessed in May 2015 (2015). [58] D. de Kok, H. Brouwer, Natural Language Processing for the Working Programmer, nlpwp.org, 2011, Ch. 3. URL http://nlpwp.org/book/ [59] Facebook4j, Facebook4j, http://facebook4j.org/, accessed in November 2014 (2014). [60]
T. P. G. D. Group, Postgres, http://www.postgresql.org/,
accessed
in
November
2014 (2014).
[61]
Microsoft, Bing
Translator, http://www.bing.com/translator/, accessed in November 2014 (2014).
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
Figure Captions
Figure 1. Percentage of the real sentiments in cancer groups A and B from Facebook.
35
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Tables
Table 1. Comparison of tools that support the Portuguese language considering 100 posts from group A. The measures, present in percentages are: precision (Pr), recall (Re), F1-measure (F1), their averages and accuracy (Ac). Pr SHC-pt 70 SEM-pt 33.33 SST-pt-wc 66.66 SST-pt-woc 65.38
Positiv Re 90.74 12.96 18.51 31.48
F1 79 22.22 28.98 42.5
Pr 44.4 26.3 24.6 25.7
Negativ Re F1 40 42.1 25 25.64 85 38.2 45 32.72
Pr 63.63 26.92 50 38.46
Neutral Re F1 53.84 58.33 80.76 40.38 30.76 38 57.69 46.15
Pr 59.36 28.85 47.1 43.18
Averag Re F1 61.52 59.82 39.57 28.23 44.76 35 44.72 40.46
Ac 71 33 35 41
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Table 2. Comparison between SHC-pt and tools that support the English language, considering the 100 posts from group A. Pr SHC-pt 70 SEM-en 55.55 ACM-en 63.49 SST-en-wc 50 SST-en-woc 50 TAY-en 48
Positiv Re 90.74 55.55 74 12.96 24 66.66
F1 79 55.55 68.37 20.58 32.5 55.81
Pr 44.4 10.5 29 24.2 21.4 13.6
Negative Re F1 40 42.1 10 10.25 45 35.29 85 37.77 45 29 15 14.28
Pr 63.63 36.17 37.5 43.75 21.87 12.9
Neutral Re 53.84 65.38 11.53 26.92 26.92 15.34
F1 58.33 46.57 17.64 33.33 24.13 14
Pr 59.36 34 43.34 39.34 31.1 24.84
Averag Re 61.52 43.64 43.53 41.62 31.99 32.35
F1 59.82 37.46 40.43 30.56 28.55 28
Ac 71 49 52 31 29 43
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Table 3. Comparison of tools that support the Portuguese language considering 90 posts from group B. Pr SHC-pt 57.4 SEM-pt 40.9 SST-pt-wc 56.25 SST-pt-woc 51.28
Positiv Re F1 68.88 62.62 20 26.86 40 46.75 44.44 47.61
Pr 19 11.7 11.7 16
Negative Re 40 20 40 40
F1 25.8 14.81 18.18 22.85
Pr 61.29 36.23 45.83 50
Neutral Re F1 54.28 57.57 71.42 48 31.42 37.28 37.14 42.62
Pr 45.91 29.63 37.94 39
Averag Re F1 54.39 48.66 37.14 29.91 37.14 34 40.52 37.69
Ac 60 40 36.66 41.11
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Table 4. Comparison between SHC-pt and tools that support the English language, considering 90 posts from group B. Pr SHC-pt 57.4 SEM-en 54.28 ACM-en 53.22 SST-en-wc 51.61 SST-en-woc 48.71 TAY-en 44.11
Positiv Re F1 68.88 62.62 42.22 47.5 73.33 61.68 35.55 42.1 42.22 45.23 66.66 53
Pr 19 13.6 21.7 7.31 10 3.7
Negative Re 40 30 50 30 30 10
F1 25.8 18.75 30.3 11.76 15 5.4
Pr 61.29 39 33.33 44.44 52.38 29.62
Neutral Re 54.28 45.71 8.57 22.85 31.42 22.85
F1 57.57 42.1 13.63 30.18 39.28 25.8
Pr 45.91 35.64 36 34.45 37 25.81
Averag Re F1 54.39 48.66 39.31 36.11 43.96 35.2 29.47 28 34.55 33.17 33.14 28.1
Ac 60 42.22 45.55 30 36.66 43.33
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Table 5. Comparison of tools that support the Portuguese language considering 50 random posts from group A. Pr SHC-pt 68.57 SEM-pt 30 SST-pt-wc 71.42 SST-pt-woc 60
Positiv Re 92.3 11.53 19.23 34.61
F1 78.68 16.66 30.3 43.9
Pr 66.66 37.5 32.35 50
Negativ Re F1 50 57.14 25 30 91.66 47.82 50 50
Pr 83.33 22.5 44.44 34.78
Neutral Re 41.66 75 33.33 66.66
F1 55.55 34.61 38.09 45.71
Pr 72.85 30 49.4 48.26
Averag Re 61.32 37.17 48.07 50.42
F1 63.79 27.09 38.74 46.53
Ac 70 30 40 46
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Table 6. Comparison among SHC-pt and tools that support the English language, considering 50 random posts from group A. Pr SHC-pt 68.57 SEM-en 48.14 ACM-en 51.42 SST-en-wc 42.85 SST-en-woc 46.66 TAY-en 51.28
Positiv Re 92.3 50 69.23 11.53 26.92 76.92
F1 78.68 49.05 59.01 18.18 34.14 61.53
Pr 66.66 18.18 23.07 32.35 30 25
Negativ Re 50 16.66 25 91.66 50 16.66
F1 57.14 17.39 23.99 47.82 37.5 20
Pr 83.33 33.33 25 33.33 20 6.66
Neutral Re 41.66 66.66 8.33 25 25 8.33
F1 55.55 44.44 12.5 28.57 22.22 7.4
Pr 72.85 33.22 33.16 36.18 32.22 27.64
Averag Re 61.32 44.44 34.18 42.73 33.97 33.97
F1 63.79 36.96 31.83 31.52 31.28 29.64
Ac 70 46 44 34 32 46
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Table 7. Comparison among tools that support the Portuguese language considering 50 random posts from group B. SHC-pt SEM-pt SST-pt-wc SST-pt-woc
Pr 48.3 53.8 50 43.4
Positive Negativ Re F1 Pr Re F1 75 58.82 25 42.85 31.57 35 42.42 16.66 14.28 15.38 50 50 16.66 42.85 23.99 50 46.51 23.07 42.85 30
Pr 64.7 51.42 58.33 57.14
Neutral Re F1 47.82 55 78.26 62.06 30.43 40 34.78 43.24
Pr 46.03 40.64 41.66 41.23
Averag Re F1 55.22 48.46 42.51 39.95 41.09 38 42.54 39.91
Ac 58 52 40 42
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Table 8. Comparison among SHC-pt and tools that support the English language considering 50 random posts from group B. SHC-pt SEM-en ACM-en SST-en-wc SST-en-woc TAY-en
Pr 48.3 37.5 45.9 44.4 42.8 33.3
Positive Re 75 45 85 40 45 60
F1 Pr 58.82 25 40.9 18.75 59.64 23.07 42.1 9.52 43.9 11.76 42.85 6.66
Negativ Re 42.85 42.85 42.85 28.57 28.57 14.28
F1 31.57 26.08 30 14.28 16.66 9.09
Pr 64.7 37.5 50 45.45 58.33 29.41
Neutral Re F1 47.82 55 39.13 38.29 8.69 14.81 21.73 29.41 30.43 40 21.73 25
Pr 46.03 31.25 39.67 33.14 37.65 23.13
Averag Re 55.22 42.32 45.51 30.1 34.66 32
F1 48.46 35.09 34.82 28.6 33.52 25.64
Ac 58 42 44 30 36 36
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Table 9. Comparison among tools that support the Portuguese language considering 50 random posts from groups A and B. Pr SHC-pt 68.75 SEM-pt 88.88 SST-pt-wc 73.33 SST-pt-woc 75
Positiv Re 75.86 27.58 37.93 41.37
F1 72.13 42.1 50 53.33
Pr 27.2 40 14.2 18.1
Negative Re F1 60 37.5 40 40 80 24.24 80 29.62
Pr 53.84 41.66 71.42 58.33
Neutral Re F1 43.75 48.27 93.75 57.69 31.25 43.47 43.75 50
Pr 49.95 56.85 53.01 50.5
Averag Re 59.87 53.77 49.72 55.04
F1 52.63 46.59 39.24 44.32
Ac 64 50 40 46
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Table 10. Comparison among SHC-pt and tools that support the English language, considering 50 random posts from group A and B. Pr SHC-pt 68.75 SEM-en 70 ACM-en 66.66 SST-en-wc 61.53 SST-en-woc 57.14 TAY-en 51.35
Positiv Re 75.86 48.27 75.86 27.58 27.58 65.51
F1 72.13 57.14 70.96 38.09 37.2 57.57
Pr 27.2 0 20 13.3 16 7.14
Negative Re F1 60 37.5 0 0 60 30 80 22.85 80 26.66 20 10.52
Pr 53.84 35.71 0 71.42 45.45 20
Neutral Re 43.75 62.5 0 31.25 31.25 18.75
F1 48.27 45.45 0 43.47 37.03 19.35
Pr 49.95 35.23 28.88 48.76 39.53 26.16
Averag Re 59.87 36.92 45.28 46.27 46.27 34.75
F1 52.63 34.19 33.65 34.81 33.63 29.15
Ac 64 48 50 34 34 46
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Table 11. Comparison among tools that support the Portuguese language considering 190 posts of groups A and B. Pr SHC-pt 64.51 SEM-pt 62 SST-pt-wc 59.57 SST-pt-woc 56.92
Positiv Negativ Re F1 Pr Re F1 80.8 71.74 30.76 40 34.78 18.18 28.12 31.57 20 24.48 28.28 38.35 20.38 70 31.57 37.37 45.12 21.66 43.33 28.88
Pr 62.26 36.11 47.5 43
Neutral Re 54 85.24 31.14 45.9
F1 57.89 50.73 37.62 44.44
Pr 52.51 43.25 42.48 40.55
Averag Re F1 58.3 54.8 41.14 34.44 43.14 35.85 42.2 39.48
Ac 65.78 40 35.78 41
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Table 12. Comparison among SCH-pt and tools that support the English language, considering the 190 posts from groups A and B. Pr SHC-pt 64.51 SEM-en 54 ACM-en 58.4 SST-en-wc 51.11 SST-en-woc 56.92 TAY-en 46.15
Positiv Negativ Re F1 Pr Re F1 80.8 71.74 30.76 40 34.78 47.47 50.53 12.5 16.66 14.28 73.73 65.17 25.92 46.66 33.33 23.23 31.94 18 66.66 28.36 37.37 45.12 21.66 43.33 28.88 66.66 54.54 8.16 13.33 10.12
Pr 62.26 36.78 35.29 44.11 43 20.69
Neutral Re 54 52.45 9.83 24.59 45.9 19.67
F1 57.89 43.24 15.38 31.57 44.44 20.16
Pr 52.51 34.43 39.87 37.74 40.55 25
Averag Re 58.3 38.86 43.41 38.16 42.2 33.22
F1 54.8 36 37.96 30.63 48.36 28.28
Ac 65.78 44.21 48.94 30.52 32.63 43.33
R. G. Rodrigues and R. M. das Dores and C. G. Camilo-Junior and T. C. Rosa / International Journal of Medical Informatics 00 (2015) 1–35
35
Table 13. Summary of results for all tools in all experiments. All messages from group A
SHC-pt SEM-pt SST-pt-wc SST-pt-woc SEM-en ACM-en SST-en-wc SST-en-woc TAY-en
F1 average 59.82 28.23 35 40.46 37.46 40.43 30.56 28.55 28
Ac 71 33 35 41 49 52 31 29 43
All messages from group B F1 average 48.66 29.91 34 37.69 36.11 35.2 28 33.17 28.1
Ac 60 40 36.66 41.11 42.22 45.55 30 36.66 43.33
Random messages from group A F1 Ac average 63.79 70 27.09 30 38.74 40 46.53 46 36.96 46 31.83 44 31.52 34 31.28 32 29.64 46
Random messages from group B F1 Ac average 48.46 58 39.95 52 38 40 39.91 42 35.09 42 34.82 44 28.6 30 33.52 36 25.64 36
Random All messages messages from group A and B F1 F1 Ac Ac average average 52.63 64 54.8 65.7 8 46.59 50 34.44 40 39.24 40 35.85 35.78 44.32 46 39.48 41 34.19 48 36 44.21 33.65 50 37.96 48.94 34.81 34 30.63 30.52 33.63 34 48.36 32.63 29.15 46 28.28 43.33