Available online at www.sciencedirect.com
Available online at www.sciencedirect.com Available online at www.sciencedirect.com
ScienceDirect Procedia Computer Science (2017) 000–000 Procedia Computer Science 11200 (2017) 906–916 Procedia Computer Science 00 (2017) 000–000
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
International International Conference Conference on on Knowledge Knowledge Based Based and and Intelligent Intelligent Information Information and and Engineering Engineering Systems, KES2017, 6-8 September 2017, Marseille, France Systems, KES2017, 6-8 September 2017, Marseille, France
Big Big data data analysis analysis to to Features Features Opinions Opinions Extraction Extraction of of customer customer a* a,b Ammar Ammar Mars Marsa*,, Mohamed Mohamed Salah Salah Gouider Gouidera,b
a Universite ´ a Universite ´
de Tunis, ISG , SMART Lab , 2000, Cite´ Bouchoucha, Le Bardo, Tunisia de Tunis, ISG , SMART Lab , 2000, Cite´ Bouchoucha, Le Bardo, Tunisia b Universite ´ de Tunis, ESSECT, 1089 , Montfleury ,Tunisia b Universite ´ de Tunis, ESSECT, 1089 , Montfleury ,Tunisia
Abstract Abstract Opinion mining refers to extract subjective information from text data using tools such as natural language processing (NLP), text Opinion mining refers to extract subjective information from text data using tools such as natural language processing (NLP), text analysis and computational linguistics. Micro-blogging and social network are the most popular Web 2.0 applications , like Twitter analysis and computational linguistics. Micro-blogging and social network are the most popular Web 2.0 applications , like Twitter and Facebook which are developed for sharing opinions about different topics. This kind of application becomes a rich data source and Facebook which are developed for sharing opinions about different topics. This kind of application becomes a rich data source for opinion mining and sentiment analysis. This information is crucial for managers, who should improve the quality of a product for opinion mining and sentiment analysis. This information is crucial for managers, who should improve the quality of a product based on customers’ opinions. Concerning the characteristic of a product as mobile phone, it is particularly difficult to identify the based on customers’ opinions. Concerning the characteristic of a product as mobile phone, it is particularly difficult to identify the features being commented on (e.g., camera quality, battery life, price, etc). features being commented on (e.g., camera quality, battery life, price, etc). In our work, we present a new method that able to extract product features opinions of customer from social networks using text In our work, we present a new method that able to extract product features opinions of customer from social networks using text analysis techniques. This task identifies customers opinions regarding product features. We develop a system for retrieving tweets analysis techniques. This task identifies customers opinions regarding product features. We develop a system for retrieving tweets about a product from Twitter and detect product features opinions and their polarity. To validate the effectiveness of this approach about a product from Twitter and detect product features opinions and their polarity. To validate the effectiveness of this approach , we used a dataset published by Bing Lius group in our approach experimentation. This dataset contains many notated customer , we used a dataset published by Bing Lius group in our approach experimentation. This dataset contains many notated customer reviews of five products such as Canon G3 and Nokia 6610. Next, we test this method with tweets retrieved from Twitter about reviews of five products such as Canon G3 and Nokia 6610. Next, we test this method with tweets retrieved from Twitter about Nokia,Samsung and Iphone features products. Nokia,Samsung and Iphone features products. c 2017 The Authors. Published by Elsevier B.V. c 2017 2017 The TheAuthors. Authors.Published Publishedby byElsevier ElsevierB.V. B.V. © Peer-review under responsibility of KES International. Peer-review under responsibility of KES International. International Keywords: Big data; Sentiment Analysis; Opinion mining ; Feature mining ; Keywords: Big data; Sentiment Analysis; Opinion mining ; Feature mining ;
1. Introduction 1. Introduction Big data mining is considered as the core component of the modern decision support systems related to social Big data mining is considered as the core component of the modern decision support systems related to social networks and other data sources. ”What other people think” has been an important piece of information decisionnetworks and other data sources. ”What other people think” has been an important piece of information decisionmaking processes. People are accustomed to frequently give their opinions about a subject available in different social making processes. People are accustomed to frequently give their opinions about a subject available in different social media like Facebook and Twitter. As well as other Web sources such as: product reviews, forums, discussion groups, media like Facebook and Twitter. As well as other Web sources such as: product reviews, forums, discussion groups, and blogs. and blogs. It is important for the decision maker of a company to study the opinion of customers about a product. However, It is important for the decision maker of a company to study the opinion of customers about a product. However, due to the large amount of information collected from these data sources, it is essentially impossible for a supplier or due to the large amount of information collected from these data sources, it is essentially impossible for a supplier or ∗ ∗
Corresponding author. Corresponding author. E-mail address:
[email protected] E-mail address:
[email protected]
c 2017 The Authors. Published by Elsevier B.V. 1877-0509 c 2017 The Authors. Published by Elsevier B.V. 1877-0509 Peer-review under responsibility of KES International. 1877-0509 2017responsibility The Authors. Published by Elsevier B.V. Peer-review©under of KES International. Peer-review under responsibility of KES International 10.1016/j.procs.2017.08.114
Ammar Mars et al. / Procedia Computer Science 112 (2017) 906–916
907
decision maker to read all this data collected and make a decision. For this reason, the ”big data opinion mining” has become an important issue for research in Web information. It aims to extract user opinions from a huge volume of data texts, automatically or semi-automatically. It is a challenge for us to provide effective information extraction systems able to process a large mass of data and extract customer opinions from data available on the Web. It is important to study users’ opinions about a merchandise, a product or a brand. e.g. studying users’ opinions about a Samsung brand. It is necessary to develop a system that helps to exploit users opinions from different data sources like open data. In this work, we aim to propose a big data architecture for analyzing data and extracting customers opinions about product features using a lot of search domain such as Machine Learning(ML), Natural Processing Language(NLP) and big data. This architecture is used to make decision. We consider Twitter as a data source: It is a microblogging service addressed in our work, a communication means and a collaboration system that allows users to share short text messages, which do not exceed 140 characters with a defined group of users called followers. In this work, we aim to design a new approach able to identify features costumers opinions and their polarity about a specific product from Twitter. We present the related work in section 1. In section 2, we discuss our proposed approach for features opinions extraction from social networks. Next, We present the found experimentation and results founded in section 3. Finally, we give some conclusions in section 4. 2. Related Works Feature Extraction (FE) is a delicate task in sentiment analysis .In this work, we will propose big data framework able to extract features opinions of customer about a product from opened data.In this work we use three main concepts :”Opinion Mining” , ”Big Data” and ”Opinion feature mining”. Several researchers have worked and many work has been published about these concepts . We present in this section the related work. 2.1. Big Data To face the explosion data volume , a new field of technology was born that called ”Big Data” 7 . It was iInvented by the Wweb giants like Google, Facebook and Twitter with solutions designed to provide a real-time access to large volumes of data., We can mention that Google processes hundreds of pPetabytes (PB), Facebook generates data of more than 10 petabytes per month, and Twitter generates 10 terabytes daily. It is used to describe a large mass of data. The difference between traditional data and this enormous mass of data is that the latter contains structured and unstructured data. Big data give us an opportunity and a challenge ,at the same time, to process a very large mass of data which are unstructured. Also, these data could not be acquired, managed and treated by traditional information technology. The concept of big data has been characterized by ”3V”: volume, velocity and variety 8 . The value chain of Big data can be generally divided into four phases: the generation, acquisition, storage and data analysis 9 .Data analysis is the latest and most important phase in the chain of Big Data 8 . It is used to extract useful to provide fatherly suggestions or decisions. The challenge of this phase is to quickly extract information from massive data. There are several treatment methods, and the most used method is the ”Parallel Computing”. We cite MapReduce 25 and Dryad 26 as parallel programming model.MapReduce 25 is a simple but powerful programming model for large-scale computing using a large number of clusters of commercial PCs to achieve automatic parallel processing and distribution. In MapReduce, the computational workload are caused by inputting key-value pair sets and generating key-value pair sets. The computing model only has two functions, i.e., Map and Reduce, both of which are programmed by users. 2.2. Opinion Mining An important part of our information-gathering behavior is always to find out ”what other people think ?”. With the emergence of Web 2.0, a lot of opinion resources are available such as online review sites and personal blogs. They are a new opportunities and challenges to use information technologies to seek out and understand the opinions
908
Ammar Mars et al. / Procedia Computer Science 112 (2017) 906–916
of costumers. Sentiment analysis , also known as opinion mining, is the process aiming to determine whether the polarity of a textual corpus (document, sentence, paragraph etc.) tends to be positive, negative or neutral. Binali et al 11 defined a sentiment classification that determine the subjectivity, polarity (positive/negative) and polarity strength (weakly positive, mildly positive or strong positive) of an opinion text.It was a sub discipline of computational linguistics . Some literature work uses the ontology concept 5 10 .Zhou et al 10 proposed an ontology-supported polarity mining (OSPM) approach that aimed to enhance the opinion mining process with ontology by providing detailed topic-specific information. Kontopoulos et al put forward a method to analyse sentiment based on an ontology. The paper used the reviews of users from Amazon web site and E-bay 16 . Kim 17 presented a system that, given a topic, automatically would finds the people who hold opinions about that topic and the sentiment of each opinion. This system contains a module for determining word sentiments from a sentence utilizing various classification models. In 13 12 the authors have suggested a classifier based approach for discovering opinions so as improve the marketing strategies of firms.In addition,other work proposed by Cardie et al 18 : It was an approach with multi-perspective question answering, which viewed the task as one of opinion-oriented information extraction. Tong et al 14 presented an approach able to extract sentiments from online discussions about movies by tracking and displaying the number of positive and negative messages using a timeline. In the same field, Pang et al examined movie reviews utilizing supervised learning methods, and they concluded that the unsupervised learning techniques outperformed the supervised learning techniques. Other work using Twitter as data source for the task of sentiment analysis 3 ,it perform linguistic analysis of the collected corpus. 2.3. Mining Opinion feature Feature Extraction (FE) is a delicate task in sentiment analysis. Agrawal 15 presented a method to examine sentiment analysis on Twitter data.It proposed a POS-tagging specific to analyze the role of linguistic features. To extract product features opinions of product , Hu et al have presented a new method 19 . A system named OPINE proposed by Popescu et al, mined customer reviews in order to extract an opinions list of customer about features of a particular product 20 . In the same field , a work published by Siqueira and al.It present a feature extraction process from opinions on services using WhatMatter system 21 . Eirinaki et al put forward an algorithm that analyze the sentiment of a document/review and also identifies the semantic orientation of specific components. The algorithm is integrated in an opinion search engine which presents a summary of the sentiments of the most important features 22 . Yaakub et al have also proposed an architecture that uses a multidimensional model to integrate customers characteristics and their comments about products. This approach identifies the entities and then sentiments present in the customers reviews are transformed into an attribute table by using a seven point polarity system (-3 to 3). After that a model is then produced on the basis of customers’ comments for an entity and its features 23 . Other work to mention 6 ,It is named Red Opal system ,which aims to identify the best evaluated products with respect to a set of automatically extracted features.This work consider a word as a feature when the probability of finding it in texts containing opinions is very different from the probability of finding it in any other kind of text.Many work based on ontology , we can cite 5 , It present a new method employing a hierarchical domain ontology . The proposed use a text classification algorithms and feature selection algorithms to extract features. Finally , Htay et al 2 proposed a new method to find opinion words or phrases for each feature from customer reviews in an efficient way. In financial field , Salas et al propose a new sentiment analysis method for feature and financial news polarity classification. 3. Contribution The sentiment Analysis is composed by four tasks: opinion identification, feature extraction (identifying the aspects being commented on - e.g., the price of notebook), sentiment classification (the opinion polarity is positive, negative or neutral), and result visualization and summarization 21 . We notice that researchers tend to study the features opinions extraction. Knowing that , we treat the Web data. It is huger, more variable in space and time, heterogeneous as they come from different sources. In this work, we aim to extracting features opinions of customer about a product in social networks. We present a new
Ammar Mars et al. / Procedia Computer Science 112 (2017) 906–916
909
and effective approach for extracting product features opinions from enormous data using the combine of big data and Text mining technologies. Our approach composed essentially of four phases: 1. Product presentation and lexical ontology : This step is also formed by two small steps. The first is to design a tree of positive and negative words, and the other is to design an ontology representing the product. 2. Data collection and storage : It aims to collect and integrate data published by customers in the web and more precisely from Twitter . It is a microblogging platforms . 3. Feature opinion mining: In this step , we analyze data collected using Big data and text mining techniques for customers features opinions extraction. 4. Result visualization and summarization : we present a panel representing the various characteristics of a product as well as customer opinions. 3.1. Product presentation and lexical ontology 3.1.1. Product presentation The customers usually write comments by using product features, for example: the battery life is short, screen colors are bright, phone has a FM radio, and the accessories are very bad. The People buy mobiles according to their ease of use and according to the mobile features. We have thought to create manually an ontology that presents a product . We cite the ontology proposed by Yaakub et al 23 which covered features and characteristics of mobile phones in general and in other technical terms. A product is presented by ontology of concepts and sub-concepts as following and an example of an ontology that present the mobile features:
It is in essence a simple taxonomy of concepts and attributes. The customer can use in her expression the word ”display” or ”monitor” instead of ”screen”. The ontology is enriched with synonyms and hyponyms of the attributes. 3.1.2. Lexical ontology Minqing Hu and Bing Liu 19 presented a list of negative and positive word used by the customer to express their opinion 19 . In this work, we use a lexical ontologie which is built using this list of words. We can cite some examples of words in the table below: Words Polarity Afraid, aggressive, agony, angry, batty, bad, bait, boil, buggy ... Negative Accommodative, accomplish, achievement, admire, affable, alluring, amazed, amusing ... Positive We propose this ontology in Fig.1. to present the negative and positive word to use in next step. We used this ontology in a previous work 4 3.2. Data collection and storage Big Data is handled at different stages namely, collection, storing, organizing, analyzing etc. with a motive of deciphering useful insights useful to make business decisions.This stage involves the collection of data from several
910
Ammar Mars et al. / Procedia Computer Science 112 (2017) 906–916
Fig. 1. Words Opinions Ontology
types of data sources. In this step, we aim to conceive a model able to store the collected data. We have a program that collects and stocks Tweets from Twitter in our proposed structure. In the world of Web applications, like Faceboo or Twitter, the data are not described in the same way. In fact it would lead to store data with zero values. Thus the use of relational databases is not always appropriate. This step is the first step of a big data mining process. Specifically, it is large-scale, diverse and complex integrated datasets that coming from different data sources. Such data sources include flat Files, databases, streams or other available data sources. This operation also called Data Acquisition. During this acquisition, once the raw data is collected, an efficient transformation mechanism should be used to store it in storage system to be used in the analytic process.
3.3. Feature opinion mining Generally, Data analysis research can be classified into six fields:a structured data analysis, a text data analysis, a website data analysis, a multimedia data analysis,a network data analysis, and a mobile data analysis. This phase is the most important phase of big data process. As we mentioned that this approach aims to extract feature from tweets collected. We need to use the text analysis in this step because the format of tweets stored is text. Text analysis, also named text mining, is a process to extract useful information and knowledge from unstructured data. Most text mining systems are based on text expressions and natural language processing (NLP). The NLP can enable computers to analyze, interpret, and even generate text. Some common NLP methods are: lexical acquisition, word sense disambiguation, part of-speech tagging, and probabilistic context free grammar 24 . Some NLP-based technologies are applied to text mining, including information extraction, topic models, text summarization, classification, clustering, question answering, and opinion mining. Information mining should automatically extract specific structured information from texts. We illustrate this phase in this figure below in Fig. 2 we will detail following this task : 3.3.1. feature opinions extraction based on MapReduce MapReduce 25 26 is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks. MapReduce programs are written in a particular style influenced by functional programming constructs, specifically idioms for processing lists of data. MapReduce can take advantage of locality of data, processing it on or near the storage assets in order to reduce the distance over which it must be transmitted. In fact MapReduce is divided to two tasks: The first is colled ”MAP” that take a series of key/value pairs, processes each, and generates zero or more output key/value pairs.The second colled ”REDUCE” that Reduce function once for each unique key in the sorted order. The Reduce can iterate through the values that are associated with that key and produce zero or more outputs. In our work, we adapt the MapReduce programming model to extract the feature opinion polarity from a huge amount of data. We illustrate algorithm tasks in fig.3.
Ammar Mars et al. / Procedia Computer Science 112 (2017) 906–916
911
Fig. 2. Data Analysis and features extraction
Fig. 3. MapReduce for feature opinion polarity extraction
Our system aims to find the features and detect their polarity (negative or positive ) of tweets collected from Twitter about a product. In the first task (MAP) our system iterate the set of tweets and determines the feature and their its polarity of each tweet. We illustrate this task in the algorithm bellow: Algorithm 1 MAP Task 1: Function Map (String name, String file): 2: // name: file name 3: // File : HDFS file 4: for each Tweet in file do 5: P=Polarity (tweet) 6: F=Feature (tweet) 7: emit (tweet, F , P) 8: end for Conventionally the polarity of a Tweet is defined as follows: 1 if it is a positive opinion −1 if it is a negative opinion Polarity(T ) = 0 if it is a neutral opinion
For the next step (REDUCE), our system groups the tweets according to their polarity and their feature .
912
Ammar Mars et al. / Procedia Computer Science 112 (2017) 906–916
Algorithm 2 Reduce Task Function Reduce ( int Polarity , Iterator TweetList): 2: // Polarity: Opinion Polarity +,- or Neutral // TweetList : List of tweets 4: for each feature F in TweetList do Sum+=1 6: emit (F , Polarity, Sum ) end for
3.3.2. POS-tagging of Tweet Before discussing the application of part-of-speech tagging from natural language processing, we first give some example sentences. Example 1 Input Output Example 2
The battery life is short The/DT battery/NN life/NN is/VBP short/JJ
Input screen colors are bright In corpus linguistics, partOutput the/DT screen/NN colors/NNS are/VBP bright/JJ of-speech tagging (POS tagging or POST), also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context i.e. the relationship with adjacent and related words in a phrase, sentence, or paragraph.
3.3.3. Feature Detection Using the ontology of product presentation defined previously, the Feature Detection task able to determine the feature identified in a tweet. For each noun found in a tweet, our systems try to verify whether it is a product features from the previously ontology constructed . We present bellow the method of polarity determination. Algorithm 3 Feature Detection 1: Function Polarity ( String Tweet): 2: for each W AS [noun] in Tweet Do TweetList do 3: if W Exist in ProductOntologie then 4: Return Feature 5: else 6: Return null 7: end if 8: end for
3.3.4. Polarity identification Using the ontology defined previously, the POLARITY task able to determine the polarity of a tweet. We illustrate this method in algorithm 4. It is a method used by Map task to identify if a tweet is a negative or positive opinion.For each word (verb, adverb or adjective) found in a tweet, our systems try to verify if its a negative/positive words from the ontology constructed previously. We present bellow the method of polarity determination.
Ammar Mars et al. / Procedia Computer Science 112 (2017) 906–916
913
Algorithm 4 Polarity identification 1: Function Polarity ( String Tweet): 2: for each W AS [verb/adjective/adverb] in Tweet Do TweetList do 3: if W Exist in NegativeOntology then 4: Return -1 5: else if W Exist in PositiveOntology then 6: Return 1 7: else 8: Return 0 9: end if 10: end for 3.4. Result visualization and summarization Result presentation is the final and most important phase of the value chain of big data. Big data opinion mining can provide useful values and statics for decision making. It is a summarisation and result of data processing. In our work ,It is summary representation of the features product opinions collected from Twitter. 4. Experimentation We extract data from platform of microblogging Twitter. To validate our approach described above. 4.1. Experimental data and result We conceived a tree of negative and positive lexicon from a list of words published by Minqing Hu and Bing Liu 19 27 . We uses three datasets ,from the ones available on the web ,to validate our approach in the first step. Then we are collected a set of tweets from twitter and tested their with our approach proposed. 4.1.1. Minqing Hu and Bing Liu, 2004 This dataset is a subset of the opinin mining datasets released by Minqing Hu and Bing Liu from University of Illinois at Chicago. Their dataset is available in this web site 1 . This subset contains annotated customer reviews of fives products: • • • • •
Canon G3 Nikon coolpix 4300 Nokia 6610 Creative Labs Nomad Jukebox Zen Xtra 40GB Apex AD2600 Progressive-scan DVD player
The symbols used in the annotated reviews:[+n] is Positive opinion, n is the opinion strength: 3 strongest, and 1 weakest. Note that the strength is quite subjective. [-n]: Negative opinion . This is an example of a review in Fig.4. We measured the effectiveness of our system using this Dataset. The followings table illustrate the experimentation results (Fig.5): 4.1.2. Tweets collected from Twitter To retrieve tweets from twitter, you must have a subject determined by a hash-tag. For this case, we used some hash-tag to find tweets about some products from twitter; we chose the name of products to find and retrieve tweets 1
http://www.cs.uic.edu/ liub/FBS/sentiment-analysis.html
Ammar Mars et al. / Procedia Computer Science 112 (2017) 906–916
914
Product Nokia 6610
Canon G3 Nikon coolpix 4300
Feature Size battery Weight Screen Picture Photo Battery weight Size
Positive 70% 60% 88% 55% 49% 64% 41% 95% 77%
Negative 21% 21% 0% 39% 23% 34% 59% 0% 13%
Not classified 9% 19% 22% 6% 28% 2% 0% 5% 10%
Fig. 4. Review example of dataset Fig. 5. Experimentation results of Minqing Hu and Bing Liu dataset
about these products : Samsung and Nokia. We have collected 100000 tweets around this . This next is table give the data analysis results: Product Nokia
Samsung
Iphone
HTC
Blackberry
Feature Size battery Weight Screen Size battery Weight Screen Size battery Weight Screen Size Weight Screen battery Weight Screen
Positive 53% 90% 63% 66% 79% 73% 80% 33% 92% 89% 53% 93% 44% 50% 59% 73% 61% 33%
Negative 12% 4% 22% 24% 5% 10% 9% 40% 3% 5% 33% 5% 28% 22% 13% 10% 9% 40%
Cost(s) 27
29
26
26
24
Fig. 6. Experimentation results of tweets collected
4.2. Tools Hadoop : Hadoop is an open source Apache project that aims to provide ”reliable and scalable distributed computing” . The Hadoop package aims to provide all the tools one could need for large scale computing: Fault Tolerant Distributed Storage in the form of the HDFS. Twitter4j: Twitter4J is a Twitter API binding library for the Java language licensed under Apache License 2.0 .It is an open source library .It is able to integrate the Twitter service on a program.
Ammar Mars et al. / Procedia Computer Science 112 (2017) 906–916
915
5. Conclusion In this work , we have proposed a framework for extracting features of customer opinions using Big Data technologies combined with machine learning and text mining tools. First,we have proposed an algorithm of opinion polarity determination based on MapReduce. Next, we test and validate the effectiveness of this framework with datasets available in the web in first time and next we use data cames from platform of microblogging Twitter. Our experimental results show that it is necessary to test this system with other benchmarks. Another line of research will be to treat tweet not classified. It is also very complex operation. References 1. M. d. P. Salas-Z´arate, R. Valencia-Garc´ıa, A. Ruiz-Mart´ınez, and R. Colomo-Palacios, “Feature-based opinion mining in financial news: an ontology-driven approach,” Journal of Information Science, p. 0165551516645528, 2016. 2. S. S. Htay and K. T. Lynn, “Extracting product features and opinion words using pattern knowledge in customer reviews,” The Scientific World Journal, vol. 2013, 2013. 3. X. Ding, B. Liu, and P. S. Yu, “A holistic lexicon-based approach to opinion mining,” in Proceedings of the 2008 international conference on web search and data mining. ACM, 2008, pp. 231–240. 4. A. Mars, M. S. Gouider, and L. B. Sa¨ıd, “A new big data framework for customer opinions polarity extraction,” in International Conference: Beyond Databases, Architectures and Structures. Springer, 2015, pp. 518–531. 5. B. B. Wang, R. I. Mckay, H. A. Abbass, and M. Barlow, “A comparative study for domain ontology guided feature extraction,” in Proceedings of the 26th Australasian computer science conference-Volume 16. Australian Computer Society, Inc., 2003, pp. 69–78. 6. C. Scaffidi, K. Bierhoff, E. Chang, M. Felker, H. Ng, and C. Jin, “Red opal: product-feature scoring from reviews,” in Proceedings of the 8th ACM conference on Electronic commerce. ACM, 2007, pp. 182–191. 7. C. Baru, M. Bhandarkar, R. Nambiar, M. Poess, and T. Rabl, “Big data benchmarking,” in Proceedings of the 2012 Workshop on Management of Big Data Systems, ser. MBDS ’12. New York, NY, USA: ACM, 2012, pp. 39–40. [Online]. 8. H. F. M. Gali Halevi, “Special issue on big data,” Research Trends, 2012. 9. Y. L. Min Chen, Shiwen Mao, “Big data: A survey,” Springer Science+Business Media, 2014. 10. L. Zhou and P. Chaovalit, “Ontology-supported polarity mining,” Journal of the American Society for Information Science and technology, vol. 59, no. 1, pp. 98–110, 2008. 11. H. Binali, V. Potdar, and C. Wu, “A state of the art opinion mining and its application domains,” in Industrial Technology, 2009. ICIT 2009. IEEE International Conference on. IEEE, 2009, pp. 1–6. 12. K. Dave, S. Lawrence, and D. M. Pennock, “Mining the peanut gallery: Opinion extraction and semantic classification of product reviews,” in Proceedings of the 12th international conference on World Wide Web. ACM, 2003, pp. 519–528. 13. L.-W. Ku, L.-Y. Lee, T.-H. Wu, and H.-H. Chen, “Major topic detection and its application to opinion summarization,” in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2005, pp. 627–628. 14. B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: sentiment classification using machine learning techniques,” in Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 2002, pp. 79–86. 15. A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau, “Sentiment analysis of twitter data,” in Proceedings of the Workshop on Languages in Social Media. Association for Computational Linguistics, 2011, pp. 30–38. 16. E. Kontopoulos, C. Berberidis, T. Dergiades, and N. Bassiliades, “Ontology-based sentiment analysis of twitter posts,” Expert systems with applications, vol. 40, no. 10, pp. 4065–4074, 2013. 17. S.-M. Kim and E. Hovy, “Determining the sentiment of opinions,” in Proceedings of the 20th international conference on Computational Linguistics. Association for Computational Linguistics, 2004, p. 1367. 18. C. Cardie, J. Wiebe, T. Wilson, and D. J. Litman, “Combining low-level and summary representations of opinions for multi-perspective question answering.” in New directions in question answering, 2003, pp. 20–27. 19. M. Hu and B. Liu, “Mining opinion features in customer reviews,” in AAAI, vol. 4, no. 4, 2004, pp. 755–760. 20. A.-M. Popescu and O. Etzioni, “Extracting product features and opinions from reviews,” in Natural language processing and text mining. Springer, 2007, pp. 9–28. 21. H. Siqueira and F. Barros, “A feature extraction process for sentiment analysis of opinions on services,” in Proceedings of International Workshop on Web and Text Intelligence, 2010, pp. 404–413. 22. M. Eirinaki, S. Pisal, and J. Singh, “Feature-based opinion mining and ranking,” Journal of Computer and System Sciences, vol. 78, no. 4, pp. 1175–1184, 2012. 23. M. R. Yaakub, Y. Li, A. Algarni, and B. Peng, “Integration of opinion into customer analysis model,” in Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology-Volume 03. IEEE Computer Society, 2012, pp. 164–168. 24. W. Paik, S. Yilmazel, E. Brown, M. Poulin, S. Dubon, and C. Amice, “Applying natural language processing (nlp) based metadata extraction to automatically acquire user preferences,” in Proceedings of the 1st international conference on Knowledge capture. ACM, 2001, pp. 116–122. 25. J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008. 26. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks,” in ACM SIGOPS Operating Systems Review, vol. 41, no. 3. ACM, 2007, pp. 59–72.
916
Ammar Mars et al. / Procedia Computer Science 112 (2017) 906–916
27. B. Liu, M. Hu, and J. Cheng, “Opinion observer: analyzing and comparing opinions on the web,” in Proceedings of the 14th international conference on World Wide Web. ACM, 2005, pp. 342–351.