Sentiment Classification of Tweets with Non-Language Features

ScienceDirect Procedia Computer Science 00 (2018) 000–000 ScienceDirect Available online at www.sciencedirect.com Available online at www.sciencedir...

Download PDF

919KB Sizes 0 Downloads 74 Views

Report

PDF Reader
Full Text

ScienceDirect Procedia Computer Science 00 (2018) 000–000 ScienceDirect

Available online at www.sciencedirect.com

Available online at www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000

ScienceDirect

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

Procedia Computer Science 143 (2018) 426–433

8th International Conference on Advances in Computing and Communications (ICACC-2018) 8th International International Conference Conference on on Advances Advances in in Computing Computing and Communications(ICACC-2018) (ICACC-2018) 8th Communication Sentiment Classification of Tweets withand Non-Language Features

Sentiment Classification of Tweets Features Akilandeswari Ja*,with Jothi Non-Language Gb Abstract

a,b

Department of IT, Sona College of Technology, Salem 636005, Tamil Nadu. India.

a,b

Department of IT, Sona College of Technology, Salem 636005, Tamil Nadu. India.

Akilandeswari Ja*, Jothi Gb

In recent years, mining social media sites like Twitter, Facebook have become a hot research focus. Twitter is one of the most Abstract fashionable microblogging services that permit users to express their views on e-commerce websites, sports, politics, modern technologies, movies, andsites so on. can be considered as a anatural expression viewer’s It is In recent years, miningspirituality social media likeSentiment Twitter, Facebook have become hot research focus.ofTwitter is perception. one of the most extremely to identify the sentiment/opinion about a specific product event by collecting and compiling microblog tasks fashionabledifficult microblogging services that permit users to express their viewsoron e-commerce websites, sports, politics, modern manually. It ismovies, only fairspirituality to develop and a system collects, and analyse tasks to arrive on insights that helps technologies, so on.which Sentiment cancompile be considered as amicroblog natural expression of viewer’s perception. It to is take an action against an event. system can monitor evaluate in real views, to how the whole extremely difficult to identify theThe sentiment/opinion about and a specific product or time eventonline by collecting anddemonstrate compiling microblog tasks world is reacting to fair a concept/ideology/event. Developing a system which microblog assigns polarity to arrive a tweet a hard that task.helps In this manually. It is only to develop a system which collects, such compile and analyse tasks to onisinsights to paper propose a scoring methodology to find the sentiment polarity in of real the Twitter messages. shortened and take anweaction against an event. The system can monitor and evaluate time online views,Emotions, to demonstrate how words the whole non-language features are integrated to increase the significance the score computed assigning the polarity fortask. the tweets. world is reacting to a concept/ideology/event. Developing such aofsystem which assigns for polarity to a tweet is a hard In this The experimental show that the proposed enhances the accuracy of the assignment of polarity shortened to tweets. words and paper we propose results a scoring methodology to findmethod the sentiment polarity of the Twitter messages. Emotions, non-language features are integrated to increase the significance of the score computed for assigning the polarity for the tweets. The experimental results show that the proposed method enhances the accuracy of the assignment of polarity to tweets. © 2018 The Authors. Published by Elsevier B.V. © 2018 The Authors. by Elsevier B.V. This is an open accessPublished article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection andAuthors. peer-review underbyresponsibility of the scientific committee of the 8th International Conference on Advances in © 2018 The Published Elsevier B.V. Selection and peer-review under responsibility of the scientific committee of the 8th International Conference on Advances in Computing and Communication (ICACC-2018). This is an open article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Computing and access Communication (ICACC-2018). Selection and peer-review under responsibility of the scientific committee of the 8th International Conference on Advances in Keywords: Keywords: Sentiment Analysis; Opinion Mining; Microblogging Services; Scoring Model; Classification; Social Media Analysis; Computing and Communication (ICACC-2018). Keywords: Keywords: Sentiment Analysis; Opinion Mining; Microblogging Services; Scoring Model; Classification; Social Media Analysis;

1. Introduction

1. Introduction Nowadays, a lot of social media networking sites like, Twitter, Facebook, LinkedIn and MySpace are used to share users’ everyday encounters in an informal and casual manner. Twitter is one of the popular social media web site and Nowadays, aservice lot of social media networking sitestheir like,opinions Twitter, upto Facebook, LinkedIn andcharacters. MySpace are used to share microblogging that permits peoples to post one hundred forty users’ everyday encounters in an informal and casual manner. Twitter is one of the popular social media web site and * Corresponding service author. Tel.: 427 4099755. microblogging that+91 permits peoples to post their opinions upto one hundred forty characters.

E-mail address: [email protected]

* Corresponding author. Tel.: +91 427 4099755. E-mail address: [email protected] 1877-0509 © 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the scientific 1877-0509 © 2018 The Authors. Published by Elsevier B.V. committee of the 8th International Conference on Advances in Computing and Communication (ICACC-2018). This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the scientific committee of the 8th International Conference on Advances in Computing and Communication (ICACC-2018). 1877-0509 © 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the scientific committee of the 8th International Conference on Advances in Computing and Communication (ICACC-2018). 10.1016/j.procs.2018.10.414

2

Akilandeswari J et al. / Procedia Computer Science 143 (2018) 426–433 Akilandeswari J and Jothi G / Procedia Computer Science 00 (2018) 000–000

427

This services has rapidly gained worldwide popularity, with 696 million of users registered in 2017 and post 58 million tweets per day [1, 2]. As more and more users share opinions on a variety of topics, posts about products and services they use, or their political, sports and religious views, microblogging websites have become valuable sources of people’s opinions and sentiments. Such data can be efficiently used for marketing, social studies, disease surveillance or trending topics [3]. Twitter’s audience range from regular users to leaders, company executives, country president and celebrities. Therefore, it has become important to collect text posts of users from different social and interest groups to find the sentiments that they express in the tweets. It is very difficult to identify the sentiment analysis on Twitter data due its short length and unbalanced organization of content [4]. In this paper, we present a system where tweets are collected and preprocessed related to a specific topic using Twitter streaming API. The overall sentiment score is computed and a sentiment polarity is assigned for the tweets as positive or negative or neutral. Contributions of this article: The work elaborated in this paper has the following contributions:  Designed a scoring model incorporating language and non-language features. The language features comprises of the text which describes the subject either in a positive, negative or neutral way. The nonlanguage features consist of the symbols used by the users of twitter like emoticons and shortened words.  An intuitive assignment of scores for language and non-language features. The remaining of the paper is structured as follows: Section 2 reports the related research work focusing on sentiment analysis on microblogging data. Section 3 explains the methodology adopted in the proposed scoring model. Section 4 discusses the results of the experiments with real time tweets. The final remarks and the future direction of the research are presented in Section 5. 2. Related Work There are many studies reported on Twitter sentiment analysis to find the polarity of the tweets. In [2] the author(s) proposed a hybrid technique using both corpus-based and dictionary-based methods to determine the semantic orientation of the opinion words in tweets. After the extraction of opinion words (a combination of the adjectives along with the verbs and adverbs) from the tweets, the corpus-based method was used to find the semantic orientation of adjectives and the dictionary-based method was used to find the semantic orientation of verbs and adverbs. The overall sentiment score was computed using a linear equation which combined opinion words and emotions. An algorithm for Twitter sentiment analysis based on three way classification was proposed by Khan et al. [5]. The results exhibited that the proposed method overcome the drawbacks of an existing algorithms and achieved better accuracy. Kontopoulos et al. [6] proposed the ontology-based method to identify the sentiment of Twitter posts. The machine learning based classification methods were employed to improve the sentiment classification accuracy of the tweet. In the paper by Saif et al. [7], SentiCircles and a lexicon-based approach for sentiment analysis with respect to Twitter was presented. SentiCircles considers the co-event examples of words in various settings in tweets to catch their semantics and redesign their pre-assigned quality and polarity in sentiment lexicons. This methodology takes into consideration the recognition of estimation at both substances level and tweet-level. Medhat et al. [8] examined and presented a brief survey of different sentiment analysis techniques. The primary focus of this review is to give almost full picture of sentiment analysis strategies and the related fields with brief points of interest. Three expressive signs namely, (i) adjectives, (ii) emoticon, emphatic and onomatopoeic expressions and (iii) expressive lengthening were utilized to classify the sentiment polarity. The test results demonstrate that adjectives are more discriminative and affecting than the other considered expressive signs [9, 10] combined the two text mining approaches such as Lexicon based approach and machine learning approach to perform the sentiment analysis. Lexicon based approach is utilized to identify the polarity of the tweets. In [11], the researcher developed a hybrid classification framework which is used to classify the tweets as positive or negative or neutral. This framework includes four different classifiers namely, slang classifier, emoticon classifier, SentiWordNet classifier, and domain‐specific classifier. Currently, fashionable shortened words are very much used in text messaging as a way of fast communication. The existing methods do not include these modern trend words to identify the sentiment. To overcome the limitations, a new scoring model is proposed which integrates the usage of shortened words and emoticons.

428

Akilandeswari J et al. / Procedia Computer Science 143 (2018) 426–433 Akilandeswari J and Jothi G / Procedia Computer Science 00 (2018) 000–000

3

3. Methodology Sentiment analysis on Twitter is upcoming trend of young researchers to recognize the scientific trials and its potential applications. The challenges unique to this problem area are largely attributed to the dominantly informal tone of the micro blogging [2]. Our method proposes a model to compute the sentiment score of tweets incorporating non-language features. In this work, the opinion words are considered as the combination of the adjectives along with the verbs, adverbs and emoticons. Today’s new trend with the Internet users is their usage of shortened words in the Twitter messages. These types of words are also added in the file to increase the accuracy of computation of sentiment polarity. The file is also populated with related words along with their orientation from Wordnet [12]. 3.1 Data Collection and preprocessing The Twitter streaming API tolerates real time access to publicly available data on Twitter [13]. Specific topic related tweets are gathered using a query string. Twitter4J library has been used. The library is configured to extract only English language tweets. These tweets are provided as input to pre-processing module. In the preprocessing step, all URLs, (e.g., www.statistical.com), hash tag symbol (e.g., # in #movie) and other special characters are removed. In order to save both space and time, stop words are removed from the tweets. In many cases, the inflectional words have similar meanings. The main goal of stemming algorithm is reduce a word to its stem or root form. In this article, Porter's stemming algorithm is applied to perform stemming. Using a Part of Speech (POS) tagger, the NL Processor linguistic Parser [14], the adjectives, verbs and adverbs are tagged. 3.2 Dictionary-Based (DB) Model In dictionary-based approach [15], opinion lexicon is used to find the sentiment polarity. For each tweet, number of positive opinion words and negative opinion words that appear are counted. This method identifies the polarity with the higher count. If both are equal, then neutral polarity is assigned. The benefit of using this approach is that one can easily and quickly find large number of sentiment words with their orientations. The shortcoming of applying this approach is that it fails to identify the domain or context dependent orientations of sentiment words. 3.3 Corpus and Dictionary-Based (C & DB) Model In this model [2], opinion words are extracted from the tweets and their orientation is identified. The opinion words are the combination of the adjectives, verbs and adverbs. An adjective is a describing word and is used to qualify an object. The semantic orientation of adjectives tends to be domain specific. Therefore corpus-based method is used to find the semantic orientation of adjectives. As adverbs and verbs are not dependent on the domain, dictionary-based method is used to calculate their semantic orientation. The overall tweet sentiment is then calculated using a linear equation which incorporates emotion intensifiers also. 3.4 Proposed Method: Sentiment Scoring (SS) Model Nowadays, fashionable shortened words are very much used in text messaging as a way of ease of typing a long message. The existing methods do not include these modern trend words. To overcome the limitation, we propose a scoring model, which incorporates the usage of shortened words and emoticons. For the purpose of this work, we have collected list of shortened words and emoticons. Sample shortened words with their sentiment polarity are presented in the Table 1.

4

Akilandeswari J et al. / Procedia Computer Science 143 (2018) 426–433 Akilandeswari J and Jothi G / Procedia Computer Science 00 (2018) 000–000

429

Table 1. Sample Shortened Word List and Emoticons Shortened Word CRZ GUD GT SORY

Meaning Crazy Good Great Sorry

Strength -0.5 0.625 0.875 -0.625

Emoticons :D, :-D, XD :-), :), '=), )) :'(, :'-(,

Meanings Big Grin, Laugh Happy, Smile Sad, Crying

Strength 1 0.5 -1

The text which carries or conveys opinion on the subject of the phrase is called opinion carrier. SentiWordNet 3.0 is a lexical resource for sentiment analysis and opinion mining applications. SentiWordNet assigns a polarity scores to each synset of WordNet according to the notions of “positivity”, “negativity”, and “neutrality”. Each synsets polarity score is associated to three numerical scores Pos(s), Neg(s), and Obj(s) which indicate how positive, negative, and “neutral”. Each of the three scores ranging from 0.0 to 1.0 and their sum is 1.0 for each synset [16]. The sample polarity scores for some synset are presented in the Table 2. Table 2. Sample polarity score using SentiWordNet Adjective list bad -0.5706 big 0.1033 hard -0.3334 enjoyable 0.2500 awesome 0.7500 worse -0.4627

Adverb List enough 0.1250 not -0.6250 exactly 0.1364 totally 0.5000 even -0.0050 unfortunately -0.8750

excite go miss hate win love

Verb list -0.0797 0.0037 -0.1749 -0.7500 0.1450 0.6100

For other opinion carriers which are not available in SentiWordNet, an intuitive score signifying its polarity towards positive, negative, or neutral is assigned in the same way. The scores are assigned to those words by going through various literatures on the topic [4, 6, 9]. The strengths are assigned to emoticons and shortened words [2]. A survey has been taken with the students and users on the scores assigned to the features. An average of those scores given by them is considered for assigning strength to the features. Thus the correctness of the score given to each of the language and non-language features is ensured. The opinion carriers are extracted from the POS tagged file and stored in separate files as adjective kernel list, verb kernel list and adverb kernel list. The strengths are assigned as mentioned above. The kernel lists are populated by extracting synonyms and antonyms from Wordnet. The synonyms are assigned a similar score as that of the word that is already available in the kernel list. The antonyms are assigned an opposite score either positive or negative depending upon the word in the kernel list. The population of the kernel lists will strengthen the algorithm if more tweets are required to be assigned polarity. In our proposed methodology, the score of adjective cluster 𝑆𝑆(𝐴𝐴𝐴𝐴𝐴𝐴) is calculated by the multiplying the respective adjective and adverb scores. The score of verb cluster 𝑆𝑆(𝑉𝑉𝑉𝑉𝑉𝑉) is calculated by the multiplying the respective verb and adverb scores. The overall sentiment score (SS) of the tweet is calculated using the following linear equation (1): SS(Tweet) = �[Max (S(AjC), S(VeC)] + W × S(E) + ∑� �� S(SH) �/N

(1)

where, N denotes the total no. of words in the Word Kernel List (verb, adverb, adjective, emotions and shortened words) 𝑆𝑆(𝐴𝐴𝐴𝐴𝐴𝐴) denotes the adjective cluster score 𝑆𝑆(𝑉𝑉𝑉𝑉𝑉𝑉) denotes the verb cluster score 𝑆𝑆(𝐸𝐸) denotes the emoticon score 𝑊𝑊 denotes the no. of emoticons m denotes no. of shortened words in the tweet 𝑆𝑆(𝑆𝑆𝑆𝑆) denotes score of the shortened words If the overall score of the tweet is greater than zero, the tweet is classified as a positive tweet. If it is less than zero it is classified as a negative tweet or if it is closer to zero it is a neutral tweet. Some sample tweets with the computed sentiment orientation by the proposed methodology is presented in Table 3.

Akilandeswari J et al. / Procedia Computer Science 143 (2018) 426–433 Akilandeswari J and Jothi G / Procedia Computer Science 00 (2018) 000–000

430

Table 3. Example tweets and sentiment polarity Sl. No.

Tweet

1 2 3 4

Score

Election this could get nasty I never reject this work The book is gud RT annual Spring Game is set for Saturday April at pm ET in Commonwealth Stadium With no respect for Indian cricket should be barred from Villiers should go from IPL or be thrown out

5

-0.2000 0.2200 0.1667

5

Sentiment Polarity Negative Positive Positive

0.0100

Neutral

-0.0111

Negative

4. Results and Discussion 4.1. Data sets utilized Six distinctive datasets are used for this experiment. The datasets are collected from a representative example of present scenario during the extraction of the tweets. The tweets related to e-commerce would have rich collection of tweets giving various opinions of different products. The data are gathered from Twitter using Twitter streaming API. Table 4 summarizes the datasets used for this experimentation. A total of 750 tweets were collected during the period from January 2016 to March 2016. Those tweets are manually classified as positive, negative and neutral. The manual classification is done to train our methodology and to ensure the correctness of the proposed methodology in assigning the sentiment score to the new tweet. Table 4. Data Sets Data Sets

Query String

D1 D2 D3 D4 D5 D6

IPL Cricket Commonwealth Games Parliament Election Amazon Flipkart Snapdeal

Positive 54 72 101 50 62 76

No. of Tweets Negative Neutral 17 29 20 58 29 58 20 30 9 29 7 17

Total 100 150 200 100 100 100

4.2. Evaluation performance of the proposed scoring model All the experiments are conducted on Intel core i3 processer with 2GB main memory running Windows and codes are implemented in Matlab and Java. In this section, the performance of polarity identification of various scoring techniques for the six different data sets involved is assigned. We have compared our scoring model with dictionary based model and corpus based model. The basics of comparison are established with the fact that our methodology combines the techniques of both dictionary and corpus based algorithms. Each dataset is divided into two parts training dataset and testing dataset. Totally, 80% of samples are taken as training data and the remaining 20% of samples are considered as testing dataset. Classification accuracy is one of the most commonly used evaluation metric which is not suitable for assessing imbalanced data set [17]. For this reason, additional metrics are used. In this study, the following performance measures such as Precision, F-index and overall accuracy were used to analyze the efficiency of the proposed algorithm. The classification measures of complete datasets are presented in Table 5. 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑇𝑇𝑇𝑇/(𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹)

𝐹𝐹�� (𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 − 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖) = 2 ∗ ((𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑥𝑥 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅) / (𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 + 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅)) 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 = (𝑇𝑇𝑇𝑇 + 𝑇𝑇𝑇𝑇) / (𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹 + 𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹)

(2) (3) (4)

6

Akilandeswari J et al. / Procedia Computer Science 143 (2018) 426–433 Akilandeswari J and Jothi G / Procedia Computer Science 00 (2018) 000–000

431

Table 5. Classification Measures for various datasets Dataset

Scoring Methods Proposed Model (with NLF) Proposed Model (without NLF) D1 C & DB Model DB Model Proposed Model (with NLF) Proposed Model (without NLF) D2 C & DB Model DB Model Proposed Model (with NLF) Proposed Model (without NLF) D3 C & DB Model DB Model Proposed Model (with NLF) Proposed Model (without NLF) D4 C & DB Model DB Model Proposed Model Proposed Model (without NLF) D5 C & DB Model DB Model Proposed Model (with NLF) Proposed Model (without NLF) D6 C & DB Model DB Model Note: NLF – Non Language Features

Precision 71.93% 70.59% 56.15% 42.15% 74.14% 72.18% 74.51% 52.77% 65.32% 63.52% 67.45% 48.17% 72.00% 69.74% 67.11% 49.79% 72.90% 71.58% 64.64% 52.70% 77.61% 75.01% 77.97% 51.72%

F - Index 72.81% 69.84% 57.84% 42.24% 76.60% 71.98% 76.36% 52.17% 65.30% 62.08% 67.15% 46.28% 70.96% 70.58% 69.09% 49.71% 75.40% 71.39% 68.70% 51.92% 80.61% 76.89% 79.01% 49.83%

Overall Accuracy 69.00% 68.43% 55.00% 47.00% 75.67% 72.61% 72.00% 61.33% 68.50% 63.75% 70.50% 43.50% 70.00% 68.67% 67.00% 51.00% 77.00% 72.48% 72.00% 58.00% 84.00% 75.46% 80.00% 62.00%

We have compared the accuracy of the proposed technique with existing algorithms of sentiment analysis. We have also applied our methodology to find the sentiment scores of the tweets without non-language features. The comparison is shown in Table 7. It is clearly notifies that the proposed algorithm shows better accuracy for classification. It should also be noted that a decrease in classification accuracy is recorded for proposed method in dataset D3. The proposed model identified many tweets as positive. Contextually some of the tweets must be identified as neutral. The word ‘no’ is classified as determinants in preprocessing step during POS tagging, those words will not be stored in the verb kernel list, adverb kernel list or adjective kernel list. For example, the tweet ‘Good news no for today’s commonwealth’ is incorrectly classified as a positive tweet. Since the word ‘no’ is considered as determinant by the POS tagger. 4.3 Performance of scoring model using ROC curve analysis The classification accuracy is measured by the region under the Receiver-Operating Characteristic (ROC) curve. It is an excellent method for arranging classifiers and visualizing their execution. The sentiment scoring model is utilized to classify the polarity of the tweets as positive, negative and neutral for six distinctive datasets. ROC curve is used to evaluate the performance of the scoring model and it is presented in Fig. 1.

432

Akilandeswari J et al. / Procedia Computer Science 143 (2018) 426–433 Akilandeswari J and Jothi G / Procedia Computer Science 00 (2018) 000–000

D1

D2

D3

D4

D5

D6

Fig. 1. The ROC curve analysis for different datasets.

7

8

Akilandeswari J et al. / Procedia Computer Science 143 (2018) 426–433 Akilandeswari J and Jothi G / Procedia Computer Science 00 (2018) 000–000

433

In respect of dataset D6, all the tweets are correctly classified as positive, negative and neutral. The ROC point at over (0.2, 0.8) produces its highest accuracy (84%). In respect of dataset D3, positive and neutral tweets are closer to the diagonal line which demonstrates the minimum classification accuracy of scoring model when compared to other datasets. In this figure, it can also be seen that most of the negative tweets are correctly classified as negative for all the datasets, which is interpreted by the negative lines that appear in the most conservative region of the graph. 6. Conclusion and Future Work In this paper, a sentiment scoring model is proposed to identify the polarity of the tweets. Opinion carriers are extracted from the tagged file and are assigned opinion strength. Shortened word list and emoticons list are added in the proposed scoring model to increase the accuracy of determining sentiment polarity. Based on the score, the tweets are classified as positive, negative or neutral tweets. The experimental results show that the scoring model has produced very hopeful results in the classification of sentiment analysis. Our methodology produces promising results even if the non-language features are not included. The system can be extended to incorporate suitable statistical techniques to analyze the classification performance. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

www.statisticbrain.com/twitter-statistics Akshi Kumar and Teeja Mary Sebastian. (2012) “Sentiment Analysis on Twitter.” International Journal of Computer Science, 9: 372-378. Alexander Pak, Patrick Paroubek. (2010) “Twitter as a Corpus for Sentiment Analysis and Opinion Mining.” Proceedings of the Seventh conference on International Language Resources and Evaluation LREC'10, European Language Resources Association ELRA. 10: 1320-1326. Saif, H., He, Y., and Alani, H (2012) “Semantic Sentiment Analysis of Twitter.” International conference on The Semantic Web, SpringerVerlag Berlin. 508-524. Khan, F. H., Bashir, S., & Qamar, U (2014) “TOM: Twitter opinion mining framework using hybrid classification scheme.” Decision Support Systems, 57: 245-257. Kontopoulos E., Berberidis C., Dergiades T and Bassiliades N (2013) “Ontology-based sentiment analysis of twitter posts.” Expert Systems with Applications. 4065-4074. doi:10.1016/j.eswa.2013.01.001. Saif, Hassan, Yulan He, Miriam Fernandez, and Harith Alani (2016) “Contextual semantics for sentiment analysis of Twitter.” Information Processing & Management 52: 5-19. Medhat, W., Hassan, A., and Korashy, H. (2014) "Sentiment analysis algorithms and applications: A survey.” Ain Shams Engineering Journal 5(4): 1093-1113. Fersini, E., E. Messina, and F. A. Pozzi. (2016) “Expressive signals in social media languages to improve polarity detection.” Information Processing & Management 52: 20-35. doi:10.1016/j.ipm.2015.04.004. Lalji, T. K., and Deshmukh, S. N (2016) “Twitter Sentiment Analysis Using Hybrid Approach.” International Research Journal of Engineering and Technology 3: 2887-2890. Asghar, M. Z., Kundi, F. M., Ahmad, S., Khan, A., & Khan, F. (2018) “T‐SAF: Twitter sentiment analysis framework using a hybrid classification scheme.” Expert Systems, 35(1), e12233. http://wordnet.princeton.edu Kumar, Shamanth, Fred Morstatter, and Huan Liu. (2014) “Twitter Data Analytics.” Springer-Verlag New York. Liu, Bing. (2012) “Sentiment analysis and opinion mining.” Synthesis lectures on human language technologies 5(1): 1-167. www.infogistics.com/textanalysis.html Baccianella, Stefano, Andrea Esuli, and Fabrizio Sebastiani. (2010) “SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining.” In LREC, 10: 2200-2204. Guo. X, Yin. Y, Dong.C, Yanga.G, Zhou.G. (2008) “On the class imbalance problem.” IEEE Fourth International Conference on Natural Computation 4: 192–201.

Sentiment Classification of Tweets with Non-Language Features

Sentiment Classification of Tweets with Non-Language Features

Recommend Documents