Engineering Applications of Artificial Intelligence 81 (2019) 68–78
Contents lists available at ScienceDirect
Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai
Exploring eWOM in online customer reviews: Sentiment analysis at a fine-grained level✩ Qing Sun a , Jianwei Niu a , Zhong Yao b ,∗, Hao Yan a a b
School of Computer Science and Engineering, Beihang University, Beijing 100191, China School of Economics and Management, Beihang University, Beijing 100191, China
ARTICLE
INFO
Keywords: Sentiment analysis eWOM Fuzzy product ontology
ABSTRACT Customer reviews in social media and electronic commerce Web sites contain valuable electronic word-ofmouth (eWOM) information of products, which facilitates firms’ business strategy and individual consumers’ comparison shopping. Exploring eWOM of products embedded in customer reviews has attracted interest from researchers in various fields. Coarse-grained and context-free sentiment analysis approaches have been used in existing researches, which however often fail to satisfy the firms’ demands of fine-grained extraction of market intelligence from social media. In this study, we propose an original method to explore eWOM of products based on sentiment analysis at fine-grained level from a large volume of online customer reviews. We illustrate a feature-based and context-sensitive sentiment analysis mechanism that can leverage the sheer volume of customer reviews in social media sites. A novel semi-supervised fuzzy product ontology mining algorithm is proposed to extract semantic knowledge from online customer reviews with positive or negative labels. Based on real-world online customer review data set, the proposed method shows remarkable performance improvement over baseline methods at exploring eWOM of product a fine-grained level. With the novel eWOM exploring method, firms can improve their product design and marketing strategies, and potential consumers can make better online purchase decisions.
1. Introduction Customer reviews in social media and electronic commerce Web sites contain valuable electronic word-of-mouth (eWOM) information of products. According to a research by China Internet Network Information Center, 82.1 percent of online shoppers read online reviews, and 41.1 percent of them claim that they use online reviews every time when making an online purchase decision (CNNIC, 2015). The analysis of customer reviews facilitates firms’ business strategy and individual consumers’ comparison shopping. Sentiment analysis of customer review text is an effective auxiliary method to analyze mass online product reviews. Traditional sentiment analysis researches on products’ online reviews are conducted primarily at sentence level. Consumers’ satisfaction level for a targeted product is obtained with data mining method, such as lexicon-based and machine-learning-based techniques (Mao et al., 2015). Both have been proved effective in online review sentiment analysis in prior works. However, opinions of customers, as typical User Generated Content (UGC), always contain emotional polarity towards product features, expressed in the form of emotional vocabulary around target words. The problem is that the
same emotional expression may represent different polarity in meaning when modifying different evaluation targets, as shown in the following two comments: • The speed for the mobile is fast • Power consumption for this mobile is too fast In the two online reviews above, the appraisal relationship (speed, fast) and (power consumption, fast) can be extracted separately. The same opinion word ‘‘fast’’ expresses absolutely different sentiment when appraising different targets. The former one is positive and the latter one is negative. Generally, the traditional sentiment analysis methods fail to identify context-sensitive sentiment polarity. In our prior study (Sun et al., 2016), we have already noticed the related problem and attributed the problem to the domain diversity by building specific lexicons for each domain (books, movies, hotels, electronics) for sentiment analysis of online customer review texts. In this research, we extend our prior work and dive into the problem of sentiment analysis at fine-grained level using fuzzy product ontologies. The goal of this technology is to detect the polarity of opinions on a certain
✩ No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.engappai.2019.02.004. ∗ Corresponding author. E-mail address:
[email protected] (Z. Yao).
https://doi.org/10.1016/j.engappai.2019.02.004 Received 5 January 2018; Received in revised form 17 December 2018; Accepted 4 February 2019 Available online xxxx 0952-1976/© 2019 Elsevier Ltd. All rights reserved.
Q. Sun, J. Niu, Z. Yao et al.
Engineering Applications of Artificial Intelligence 81 (2019) 68–78
product feature, which can be used to explore eWOM of product from online customer reviews. The main contributions of our research work are summarized as follows:
2.2. Opinion mining of online customer reviews Opinion mining is often used to detect the polarity of opinions on a certain topic when analyzing large volume of customer reviews. As for online customer reviews, product features (goods or services) are categorized into explicit features (explicit characteristics) and implicit features (implicit characteristics). Explicit features, which are directly expressed in the comments in the form of nouns or noun phrases (Kang and Zhou, 2016), have received more attention and are mostly studied using the mixed method of artificial definition (Liu and Zhang, 2012) and automatic extraction (Kobayashi et al., 2004; Wang and Lee, 2012; Jin et al., 2016). Kobayashi et al. (2004) extract the features of automobile and video games through the semi-automatic method to create related product characteristics. Wang and Lee (2012) use emotional dictionary (HowNet) to get opinion words from digital camera’s blog comments, and then enhance the accuracy of extracting the features using window slide. In Jin et al. (2016), the Conditional Random Fields (CRFs) approach is utilized to recognize both feature aspects and detailed reasons provided by the customers from each product-feature-related sentence. The results of product feature mining can be further applied in semantic analysis on emotional tendency of the product features. Hu and Liu (2004) propose the concept of aspect-based opinion mining to find the sentiments in different features from reviews. Titov et al. (2008) use a statistical model to discover topics in documents and extract textual evidence to support the rating of each topic. Wang et al. (2010) present a probabilistic regression model to discover the latent opinions on each aspect for each reviewer. The recent research based on topic model achieved better results in the multi-aspect sentiment analysis task in recent years, among which LDA algorithm is a basic model most widely used. Improved LDA-based algorithms such as ASUM (Jo and Oh, 2011), ME-LDA (Zhao et al., 2010), ME-SAS (Mukherjee and Liu, 2012) show better performance than the original LDA algorithm for sentiment analysis.
(1) We design a novel automatic fuzzy product ontology mining methodology based on social analytics. Product features and their relationship are captured to support sentiment analysis at different level. (2) We develop a fine-grained sentiment analysis method supervised by semantic knowledge, by which context sensitive sentiments can be extracted from online customer reviews. The method is testified using benchmark evaluation procedure and data set. (3) We develop an eWOM exploration system with the fine-grained sentiment analysis method. The designed system can perform at different product feature level to provide deep analytics for the target product from the automatically captured customer reviews that were posted to different social media. The remainder of this paper is structured as follows. Related work on sentiment analysis for online customer is discussed in Section 2. The computational method of automatic fuzzy product ontology construction is proposed is presented in Section 3, followed by the results presentation is Section 4. Application of our method in exploring eWOM of products and the evaluation of the experimental results are described in Section 5. Finally, we conclude our work in Section 6. 2. Related work Substantial research has been conducted in the related academic areas. In this section, we review the literature of polarity detection and opinion mining of online customer reviews. The related works of ontology-based sentiment analysis are also reviewed. 2.1. Polarity detection of online customer reviews
2.3. Ontology based sentiment analysis
To obtain customers’ satisfaction level and eWOM towards a targeted product, the polarity detection of online customer reviews has been used. Most existing approaches on polarity detection has focused on classifying the individual tweets as positive or negative. These approaches can be categorized as supervised methods (those which need training data) and lexicon-based methods (those based on dictionaries of terms with associated sentiment orientations) (Kanayama and Nasukawa, 2006; Kaji and Kitsuregawa, 2007; Ghiassi et al., 2013; Tan et al., 2008). However, for these sentiment analysis methods, detection of the polarities of sentiments is often conducted out of context (e.g., a product domain), which reduces the accuracy significantly. For example, identification of the word ‘‘heavy’’ as a negative word when describing a mobile phone or a positive word when describing a wooding table is almost impossible for the traditional sentiment analysis method. Some researchers have dived into this problem using contextual semantic approaches. Turney and Littman (2003) propose an inference-based opinion mining method called Semantic Orientation (SO) analysis to estimate the polarity of sentiments. The SO of a token can be estimated based on the strength of association between the token and the fourteen seeding sentiment indicators such as good, nice, bad, poor and the other evaluative words. Turney and Littman (2003) employ Point-wide Mutual Information (PMI) to compute the strength of association between any pair of tokens. Other researchers use external semantic knowledge bases (e.g., ontologies and semantic networks) to capture the conceptual representations of words that implicitly conveyed sentiment. In Saif et al. (2012), Saif et al. have confirmed that incorporating general conceptual semantics (e.g.,‘‘president’’ ‘‘company’’) into supervised classifiers can improve sentiment accuracy.
Ontology-based sentiment analysis has been proposed to draw semantic information from the large volume of online customer reviews. Ontology can take the simple form of concept taxonomy or the more comprehensive representation of comprising a taxonomy as well as the axioms and constraints which characterize some prominent features of the real-world (Chi, 2007). Most related researches focus on the manual construction of ontology, including both capturing of the attributes included in an ontology and the structuring of these attributes. (Zhou and Chaovalit, 2008) manually construct a movie domain ontology and propose an Ontology Supported Polarity Mining method to extract features and their relationship from movie online reviews, the results of which were applied to sentiment classification at the document level. (Wei and Gulla, 2010) manually construct a defined Sentiment Ontology Tree (SOT) and propose a novel HL-SOT approach to label a product’s attributes and their associated sentiments in product reviews by a Hierarchical Learning (HL) process. Similar method is proposed in the research of (Tan et al., 2012) in which the ontologies are used to enhance automated sentiment analysis of user review text. Other researches focus on improving the automatic procedure of ontology construction to enhance the opinion mining. (Miao et al., 2010) exploit a seeding set of product features and sentiment words for learning a knowledge base so that the polarity detection of unstructured consumer comments can be improved. Lau et al. (2014) propose a method of product ontology construction in which the taxonomic and nontaxonomic relations of a product ontology are automatically mined based on a LDA-based learning method. Our research approach is firstly driven by Lau’s (Lau et al., 2014) and Kim’s (Amplayo and Song, 2017) methods. Both use Latent Dirichlet Allocation(LDA) model in the fine-grained level sentiment analysis for online product reviews. In their researches, several topics are 69
Q. Sun, J. Niu, Z. Yao et al.
Engineering Applications of Artificial Intelligence 81 (2019) 68–78
Fig. 1. The computational method of fuzzy product ontology construction. 𝑃𝑖 (𝑖 = 1, 2, … , 6) represents the extracted product feature.
discovered by the topic model, and the words of high probability distribution in each topic are collected as sub-topics. Then, the general hierarchical relationships between topics are obtained according to the correlation coefficient and promising results on English online review data set have been reported. In our work, we notice, however, that LDA topic model has some inherent defects when applying to Chinese online reviews corpus. The most prominent problem is that the underlying topics of the document cannot fully represent review aspects (product features, etc.). Although different product reviews contain different topics, we insist that using topic classification directly is not equivocal. Our research approach is also driven by Pratik’s research (Thakor and Sasi, 2015), in which an ontology-based process is designed to retrieve and analyze the customers’ tweet with negative sentiments. Pratik uses a combined method of twitter extraction, data cleaning, subjective analysis, ontology model building, and sentiment analysis. He first builds an ontology model of classes, objects and object properties’ information, and then uses a rule-based platform GATE to label the classes and objects. Pratik’s way of building a ontology is enlightening for text analysis. However, relying mostly on the existed lexicons, the method had weak adaption to the constantly changing social network context. Besides, tweet texts were quite different from online product reviews. Wei (Wei and Gulla, 2010) proposes a novel HL-SOT approach to label a product’s attributes and their associated sentiments in product reviews: to label the attributes of a product by Hierarchical Learning (HL) process with a defined Sentiment Ontology Tree (SOT). However, in Wei’s work, the SOT is still manually constructed. In our work, we improve prior works by exploiting an automatic and extensible method to capture product’s attributes and corresponding sentiment words. These attributes and sentiment words are used as semantic knowledge to construct a fuzzy product ontology model (Lau et al., 2014), which is a typical ontological application in the domain of e-commerce. The computational method and the results of constructing are respectively described in Sections 3 and 4. We then use the ontology to perform feature-based sentiment analysis and explore eWOM of mobile products at a fine-grained level as a case study.
Fig. 2. Example of constructed fuzzy product ontology; ‘⟶’ represents the hierarchy relationship between product features; ‘− − −’ connects the context-sensitive sentiment polarity associated with the product features.
The fuzzy product ontology captures taxonomic relations of product features. For instance, ‘‘resolution’’ is a subclass feature of ‘‘screen’’ for mobile. It can also describe the non-taxonomic relations between product features and the corresponding appraisal expression such as screen size is associated with the appraisal word ‘‘big’’. In addition, context sensitive sentiment orientation (e.g., ‘‘positive’’) of the appraisal word ‘‘big’’ is also captured in the fuzzy product ontology, as is shown in Fig. 2. The taxonomic relations among product features and the nontaxonomic relations between product feature and corresponding sentiment cannot be directly obtained at the product description webpages. Therefore, we intend to detect the relationships in Fig. 2 in product ontology from labeled corpus contributed by social analytics. 3.2. Automatic fuzzy product ontology construction based on social analytics
3. The method for fuzzy product ontology construction
The fuzzy product ontology mining process is invoked before featurebased sentiment analysis. The online product reviews we collect from popular social media or e-commerce websites are pre-annotated as positive or negative through the devotion of social media participants, which constitute our corpus. The computational details of the fuzzy product ontology mining algorithm will be described in this section. A mapping of product aspects and concepts of ontology is leveraged. The potential concepts are automatically constructed from product features extraction and clustering. Our proposed algorithm can be divided into 3 modules:
In this section we explain the concept of fuzzy product ontology and propose an automatic and effective fuzzy product ontology constructing method based on social analytics as described in Fig. 1. 3.1. Fuzzy product ontology The fuzzy product ontology we construct is a lightweight ontology underpinned by fuzzy sets and fuzzy relations (Zadeh, 1965). Definition 1 (Fuzzy Set).𝐹𝑐 consists of a synonym set of concepts drawn from product, features and sentiment words (positive or negative).
(1) product feature (concept) extraction (2) concept hierarchy learning (3) context-sensitive sentiment learning
Definition 2 (Fuzzy Product Ontology). A fuzzy product ontology is a triple 𝑂𝑛𝑡 ∶=< 𝐹𝑐 , 𝑅𝑇 𝐴𝑋 , 𝑅𝑁𝑇 𝐴𝑋 >. The membership function 𝑓𝑅𝑇 𝐴𝑋 ∶ 𝐹𝑐 × 𝐹𝑐 → [0, 1] defines the subclass/superclass_of relationships among the set of concepts C, and the membership function 𝑓𝑅𝑁𝑇 𝐴𝑋 ∶ 𝐹𝑐 × 𝐹𝑐 → [0, 1] defines the sentiment_feature relationships among the set of concepts 𝐹𝑐 .
The concept extraction module extracts concepts from product feature words generated by online product reviews; the concept hierarchy learning module learns a hierarchy of extracted concepts by inducing a subsuming hierarchy learning algorithm and the context-sensitive sentiment learning module from social tagging online review corpus. 70
Q. Sun, J. Niu, Z. Yao et al.
Engineering Applications of Artificial Intelligence 81 (2019) 68–78
Candidate aspect words filtering by co-occurrence with opinion words. In this step, we extract the appraisal relationship using the results of POS (part-of-speech) tagging. Those words that appear sparsely in the appraisal relationship will be filtered out. Finally, HowNet2 and Word2vec3 are jointly used to compute the semantic similarity of aspect words. HowNet is firstly used to compute the semantic similarity of aspect words. For those words not recorded in HowNet, typical candidate feature words are chosen as seed words for the model of Word2vec. Semantic similarity between words is calculated by the cosine values of word vectors. We use Word2vec in Gensim package4 to implement the semantic similarity calculation. Synonyms are clustered through these procedures.
3.2.1. Extracting candidate product features Product feature extraction is mainly generated by association rules and the clustering algorithm. Detailed explanations of procedures are given below. Data preprocessing. The first step of our proposed framework deals with the preprocessing of data, removing unwanted or irrelevant data and noises. Traditional document pre-processing procedures including stop word removal, Part-of-Speech (POS) tagging, and stemming are invoked to pre-process consumer comments and product description content. We apply the NLPIR1 package in Python to perform POS tagging for consumer comments. Similar to previous studies, elementary product features are represented by noun phrases, and sentiment indicators are represented by adjective or adverb (Hatzivassiloglou and Mckeown, 1997; Zhang et al., 2012). Therefore, we extract nouns to form a sentences library as the basic transaction documents.
3.2.2. Learning concept hierarchy Concepts can be defined as a sequence of words that may represent real or imaginary ideas or entities expressed in plain text. Extraction of relevant concept has received wider recognition in the recent past High-frequency feature word ranking by association rules. In this step, due to its wide applications in dealing with text data such as ewe use association rules to detect frequent itemsets for extracting commerce websites. Many algorithms proposed have enhanced concept product feature words. From the basic transaction documents, those extraction with varying degrees of success. In this work, we refer to frequent itemsets meeting the minimum support in Apriori algorithm a type of co-occurrence called subsumption-based approach (Anoop are considered as the candidate words. Apriori algorithm is a basic et al., 2016) to build the hierarchical structure of leveraged concepts algorithm in association rules for mining the relationship between items (aspects). Subsumption relation is found to be simple but very effective in a data set. The Apriori algorithm mainly works on the presupposition way of inferring relationships between words and phrases without that a subset is frequent on condition that its superset is frequent. We using any training data or clustering methods. According to formal only consider frequent itemsets that meet less than three characters concept analysis, the intensions of concepts can be used to evaluate within the Apriori algorithm. the subsumption relations among these concepts. The basic idea behind subsumption relation is very simple: given the intensions of two Candidate aspect words filtering by rules. Nearby rules and indepenconcepts 𝑐𝑖 and 𝑐𝑗 , if all the attributes of 𝑐𝑖 also belong to 𝑐𝑗 , that dence rules are applied to filter out those non-feature words from the is, {𝑡𝑖1 , 𝑡𝑖2 , … , 𝑡𝑖𝑛 } ⊂ {𝑡𝑗1 , 𝑡𝑗2 , … , 𝑡𝑗𝑛 }, then the concept 𝑐𝑖 is identified as candidate words extracted. subsumed concept 𝑐𝑗 . Because of the ambiguity of natural languages and the imperfect document representations in computers, there is Definition 3 (Adjacency Rule). Suppose 𝑓 is a frequent feature word always a fuzziness (uncertainty) about the subsumption relations of and contains 𝑛 words, 𝑓 appears in the sentence 𝑆 in sequence (𝑤1 , 𝑤2 , … , 𝑤𝑛 ) concepts. As a result, subsumption relations among concepts are often and the distance between two arbitrary adjacent words is no more than expressed as a inclusion range rather than a strict binary inclusion three words. There is a conclusion that 𝑓 is adjacent in sentence 𝑆. relation. Assuming that 𝑓 appears in 𝑚 sentences, and is regarded as adjacent Therefore, with respect to any two product feature concepts 𝑐𝑖 and 𝑛 in n sentences, and if 𝑚 > 𝛼 and 𝑛 > 2 , where 𝛼 is a threshold value 𝑐𝑗 , 𝑐𝑖 is said to subsume 𝑐𝑗 if 𝑃 (𝑐𝑖 |𝑐𝑗 ) = 1 and 𝑃 (𝑐𝑗 |𝑐𝑖 ) < 1. To be more depending on experiment procedure, then 𝑓 is a adjacent feature phrase specific, 𝑐𝑖 subsumes 𝑐𝑗 if the documents where 𝑐𝑗 occurs are a subset to be recorded as the candidate product feature word. Otherwise 𝑓 will of the documents which 𝑐𝑖 occurs in. In other words, in the hierarchy, be filtered out. 𝑐𝑖 is the parent of 𝑐𝑗 because it is more frequent referred in documents. Definition 4 (Independence Rules). The support value for feature 𝑓 is defined as the sentence number that contain 𝑓 itself however not contain the superset of 𝑓 . Supposing that 𝑓 appears in 𝑚 sentences and appears independently in n sentences, and if 𝑚𝑛 > 𝛽 and 𝑛 > 3 , where 𝛽 is a threshold value depending on experiment procedure, then 𝑓 will be recorded as the candidate aspect words that meet the independence rules. Otherwise 𝑓 will be filtered out.
Algorithm 1. LearnProductFeatureHierarchy Algorithm Input: CC-FeatureWordCollection Output: The hierarchy relationship between each feature pairs in CC 1: Consider any pair of feature words, say 𝑐𝑎 and 𝑐𝑏 𝑐 𝑐 2: Compute 𝑃 ( 𝑐𝑎 ) and 𝑃 ( 𝑐𝑏 ) 3:
𝑏
Candidate aspect words filtering by mutual information with seed words. We manually collect frequent characteristic words as seed words, using mutual information method to calculate PMI between the seed words and the candidate product feature words. The co-occurrence of the above two is used for further filtering. 𝑃 𝑀𝐼 = 𝑆𝑈 𝑀(log2
ℎ𝑖𝑡𝑠(𝑤1 , 𝑤) ) ℎ𝑖𝑡𝑠(𝑤1 ) ∗ ℎ𝑖𝑡𝑠(𝑤)
𝑐
𝑎
𝑎
4: Assign 𝑐𝑎 as parent of 𝑐𝑏 5: else 6: Fetch next feature word pair 7: end if 8: Goto step2 and repeat the same for all feature word pairs 9: return
(1) 3.2.3. Learning context-sensitive sentiments for a fuzzy product ontology In this section, we propose a contextual sentiment detecting method using Chinese language characteristics and tagging corpus. The results will be used in fuzzy product ontology mining. As is shown in the related researches, the words used by consumers to express sentiment
where ℎ𝑖𝑡𝑠(𝑤1 , 𝑤) represents the co-occurrence of candidate word 𝑤1 and the seed word 𝑤, ℎ𝑖𝑡𝑠(𝑤) is the occurrence of seed word 𝑤 in transaction document, and ℎ𝑖𝑡𝑠(𝑤1 ) is the occurrence of candidate aspect word 𝑤1 . With this method, feature words that are likely to be irrelevant to the product can be identified by comparing the PMI value to a certain threshold value.
2 3
1
𝑏
𝑐
if 𝑃 ( 𝑐𝑎 ) = 1 and 𝑃 ( 𝑐𝑏 ) < 1
4
http://ictclas.nlpir.org/. 71
http://www.keenage.com/. http://code.google.com/p/word2vec/. https://pypi.org/project/gensim/.
Q. Sun, J. Niu, Z. Yao et al.
Engineering Applications of Artificial Intelligence 81 (2019) 68–78 Table 1 Candidate features by the three different filtering rules.
towards product features are adjectives or adverbs. The sentiment words are mostly close to those feature words. We develop a slidingwindow-based method to detect context-sensitive sentiment for fuzzy product ontology. Sliding windows are word sequences moving from beginning to end in the customer review text.
First order frequent itemsets Second order frequent itemsets Third order frequent itemsets
Definition 5 (Sliding Window). We record the word sequence for an online review text as 𝑑 = 𝑤𝑑1 , 𝑤𝑑2 , … , 𝑤𝑑𝑁 . Sliding window is a subset of d, represented as 𝐶 = 𝑤1 , 𝑤2 , … , 𝑤|𝑊 | . Therefore 𝐶 ⊂ 𝑑, and for the same online review, any two sliding windows 𝑤𝑖 , 𝑤𝑗 have the relationship of 𝑤𝑖 ∩ 𝑤𝑗 = 𝜙. The sentiment words towards a target product feature is usually expressed in the form of adjectives or adverbs adjacent to the feature words. Their position distance is measured by a virtual text window of size 𝜔𝑤𝑖𝑛 . If 𝜔𝑤𝑖𝑛 = 1, the adjective (or adverb) immediately from the left or right of an identified product aspect is extracted. For the experiments reported in this paper, we normally set 𝜔𝑤𝑖𝑛 = 4. Those sentiment words captured in sliding windows are extracted as the candidate sentiments towards the targeted product features. In addition, this article particularly considered implicit product features. If a sentence contains only emotional adjectives or adverbs and fail to detect product attributes through sliding window, we got the appraisal relationship by looking for the nearest adjacent to discover auxiliary product features.
𝑓 𝑜𝑟 𝑃 (𝑠𝑖 , 𝑎𝑖 ) >= 𝑁(𝑠𝑖 , 𝑎𝑖 )
(3)
𝑓 𝑜𝑟 𝑃 (𝑠𝑖 , 𝑎𝑖 ) < 𝑁(𝑠𝑖 , 𝑎𝑖 )
(4)
Co-occurrence
53 9 0
320 101 3
4. The results of fuzzy product ontology construction 4.1. Social tagging online product reviews corpus We use the mobile online reviews as the corpus source of social tagging. As a typical kind of high-involvement product, mobile online reviews always contain more subjective sentences relatively, and the product attributes and their levels are more distinct. However, the method of fuzzy product ontology mining we proposed could be extended to other kind of products. We have retrieved more than 500,000 mobile online reviews from the famous third party review site ‘‘www.zol.com.cn’’, including the ratings, review text, review date and the reviewer. The reviews retrieved have already been annotated by the reviewer as positive or negative (see the following examples in Fig. 3), these annotated data provided sufficient corpus support for fuzzy ontology construction. Meanwhile, we crawled the product description information for these 103 mobiles, most of which are structural data. We preliminary selected the original data according to the following principle:
where 𝐴𝑠𝑠(𝑠𝑖 , 𝑎𝑖 ) represents the association measurement between a sentiment 𝑠𝑖 and an appraisal object 𝑎𝑖 . The empirical value of weight factor 𝜔𝑎𝑠𝑠 in reported related prior researches (Lau et al., 2009, 2014) is suggested to be [0.5, 0.7], where the noise data can be reduced to the minimum. Therefore, we used this interval to control the relative importance of two kinds of evidences (i.e., positive or negative evidence) to establish an association relationship. 𝑃 𝑟(𝑡𝑖 , 𝑡𝑗 ) is the joint probability |𝑤 | that both tokens appear in a virtual text window, and 𝑃 𝑟(𝑡𝑖 ) = |𝑤|𝑡 is the probability that a token 𝑡𝑖 appears in a text window, where ||𝑤𝑡 || is the number of windows containing the token 𝑡 and |𝑤| is the total number of windows constructed from a corpus. Similarly, 𝑃 𝑟(𝑡𝑖 , 𝑡𝑗 ) is the fraction of sum of windows who contains both tokens out of the total window number. After extracting the sentiment–aspect pairs (𝑠𝑖 , 𝑎𝑖 )from a training corpus of consumer reviews, the next step is to estimate the context sensitive polarity of 𝑠𝑖 with respect to 𝑎𝑖 . For each feature 𝑎𝑖 , the number of positive sentiment words and negative sentiment words are counted respectively. For each word 𝑠𝑖 , the frequency of positive sentiment and negative sentiment when it associates with 𝑎𝑖 are calculated. The context sensitive polarity of 𝑠𝑖 with respect to 𝑎𝑖 is calculated as the following method. 𝑃 (𝑠𝑖 , 𝑎𝑖 ) − 𝑁(𝑠𝑖 , 𝑎𝑖 ) , 𝑃 (𝑠𝑖 , 𝑎𝑖 ) 𝑁(𝑠𝑖 , 𝑎𝑖 ) − 𝑃 (𝑠𝑖 , 𝑎𝑖 ) 𝑂𝑟𝑖𝑛 = , 𝑁(𝑠𝑖 , 𝑎𝑖 )
Independence
0 4969 0
product feature 𝑎𝑖 . Eq. (3) works when 𝑃 (𝑠𝑖 , 𝑎𝑖 ) >= 𝑁(𝑠𝑖 , 𝑎𝑖 ) while equation (4) works when 𝑃 (𝑠𝑖 , 𝑎𝑖 ) < 𝑁(𝑠𝑖 , 𝑎𝑖 ). Both describe the probability of whether a sentiment word to be positive or negative. The final sentiment polarity of 𝑠𝑖 with respect to 𝑎𝑖 is defined by 𝑂𝑟𝑖𝑝 or 𝑂𝑟𝑖𝑛 . If the value of 𝑂𝑟𝑖𝑝 is greater than the threshold parameter 𝛼, the sentiment polarity of 𝑠𝑖 with respect to 𝑎𝑖 is evaluated as positive. Similarly, if 𝑂𝑟𝑖𝑛 exceeds threshold parameter 𝛽, the polarity of sentiment word 𝑠𝑖 is regarded as negative when it appraises product feature 𝑎𝑖 . To eliminate the noise, we require the sum of 𝑃 (𝑠𝑖 , 𝑎𝑖 ) and 𝑁(𝑠𝑖 , 𝑎𝑖 ) reach a threshold 𝜃. Only when 𝑂𝑟𝑖𝑝 and 𝑂𝑟𝑖𝑛 jointly exceed a threshold, we regard the results to be meaningful and consider the context sensitive sentiment. In this paper, 𝛼 is set to 0.7, 𝛽 is set to 0.7, and 𝜃 is set as the ratio of 1 to the total number of context sensitive sentiment words.
Then we use the sentiment–aspect association 𝐴𝑠𝑠(𝑠𝑖 , 𝑎𝑖 ) to measure the feature-sentiment relationship, which is reported to be successful on fuzzy product ontology mining in prior research. The sentiment–aspect association is measured as follows: 𝑃 𝑟(𝑡𝑖 , 𝑡𝑗 ) + 1 𝐴𝑠𝑠(𝑠𝑖 , 𝑎𝑖 ) = 𝜔𝑎𝑠𝑠 × [𝑃 𝑟(𝑡𝑖 , 𝑡𝑗 ) log2 𝑃 𝑟(𝑡𝑖 )𝑃 𝑟(𝑡𝑗 ) 𝑃 𝑟(¬𝑡𝑖 , ¬𝑡𝑗 ) + 1 ] +𝑃 𝑟(¬𝑡𝑖 , ¬𝑡𝑗 ) log2 𝑃 𝑟(¬𝑡𝑖 )𝑃 𝑟(¬𝑡𝑗 ) 𝑃 𝑟(𝑡𝑖 , ¬𝑡𝑗 ) + 1 −(1 − 𝜔𝑎𝑠𝑠 ) × [𝑃 𝑟(𝑡𝑖 , ¬𝑡𝑗 ) log2 𝑃 𝑟(𝑡𝑖 )𝑃 𝑟(¬𝑡𝑗 ) 𝑃 𝑟(¬𝑡𝑖 , 𝑡𝑗 ) + 1 +𝑃 𝑟(¬𝑡𝑖 , 𝑡𝑗 ) log2 ] (2) 𝑃 𝑟(¬𝑡𝑖 )𝑃 𝑟(𝑡𝑗 )
𝑂𝑟𝑖𝑝 =
Adjacency
(1) Duplicated reviews are removed (2) All the reviews cover the span from the time-to-market to Dec, 2016 (3) Each product has more than 100 online reviews (4) Each review has more than 100 words Finally, 208 622 online reviews of 103 mobiles were collected as the fuzzy product ontology corpus. Among them, 156 326 mentioned product features with positive sentiment and 159 710 mentioned product features with negative sentiment. 4.2. The results of constructing We extract the candidate product features according to the algorithm proposed in Section 3.2.1. The results of itemset filtering by rules are described in Table 1. We obtain 287 feature words. Among them, 230 are related to the mobile domain, the accuracy has reached to 80.1%. We follow the method proposed in Section 3.2 to extract product features from the corpus and learn taxonomic relationship. For the assessment of the subsumption relationship between each pair of 𝑐 concepts 𝑐𝑖 and 𝑐𝑗 , we need to compute the generation probability 𝑃 ( 𝑐 𝑖 )
where 𝑃 (𝑠𝑖 , 𝑎𝑖 ) represents the positive polarity frequency of sentiment word 𝑠𝑖 when it appraises product feature 𝑎𝑖 , while 𝑁(𝑠𝑖 , 𝑎𝑖 ) represents the negative polarity frequency of sentiment word 𝑠𝑖 when it appraises
𝑐
𝑗
between 𝑐𝑖 and 𝑐𝑗 and the generation probability 𝑃 ( 𝑐𝑗 ) between 𝑐𝑗 and 𝑖 𝑐𝑖 . However, in the experiment of this paper, we found that unlike 72
Q. Sun, J. Niu, Z. Yao et al.
Engineering Applications of Artificial Intelligence 81 (2019) 68–78
Fig. 3. Data with sentiment annotation by social tagging.
Table 2 Examples of mined product feature expressions.
Table 3 Domain lexicon.
it was suggested in prior research (Anoop et al., 2016), a threshold 𝑐 𝑐 of 𝑃 ( 𝑐 𝑖 ) >= 𝑎 and 𝑃 ( 𝑐𝑗 ) <= 𝑏 should be empirically established to 𝑗 𝑖 excluding the noise of meaningless results in the product online review texts. Based on a subset of our evaluation data set with manual check, we finally set 𝑎 = 0.9, 𝑏 = 0.7 and get 56 ‘‘subclass-of’’ relationships, which cover 116 features with 50.43% in all the extracted product features. Examples of product features and their sub-level features are shown in Table 2. Following the above procedures described in Section 3.2.3, the sentiment polarity of context-sensitive opinion word is backward confirmed and meanwhile the relationship between concept and sentiment in fuzzy product ontology is successfully obtained. Finally, we got 4181 context-sensitive sentiment words, among them 2389 are positive and 1792 are negative.
Lexicon
Positive
Negative
Total
OBL Hotels Books Electronics
3730 3815 3913 3807
7082 7154 7276 7227
10812 10969 11189 11034
negative words. Then we follow the algorithm in the research (Mao et al., 2015) to extend the OBL for domain-adaptation, which has been proved to be effective in domain-sentiment classification in our prior research. Table 3 shows the final domain lexicon used in this study. Additionally, we collect the adverbs of different intensity from the original corpus. Meanwhile, Chinese thesaurus5 is collected for combination of synonyms. The collected lexicons are firstly used in preprocessing procedure, including word segmentation and Pos Tagging. Both the product features and the sentiment words should be extracted from continuous sentence by word segmentation, the results have been improved by the built-lexicons.
5. eWOM exploration in online consumer review at a fine-grained level 5.1. Domain Lexicon building Domain Lexicons are constructed in our work for the basic sentiment analysis, which is generated from the most famous two Chinese sentiment lexicons HowNet and NTUSD. We collect 8934 sentiment words from HowNet and 11086 sentiment words from NTUSD. After removing duplicates from these two lexicons, we obtain the Chinese Original Basic Lexicon (OBL), which consists of 5834 positive words and 10108
5
73
https://www.ltp-cloud.com/.
Q. Sun, J. Niu, Z. Yao et al.
Engineering Applications of Artificial Intelligence 81 (2019) 68–78 Table 4 Weights of adverbs in different intensity level.
5.2. Feature based sentiment analysis supervised by fuzzy product ontology
5.3.1. Data collection module The data collection module is driven by the user’s query. Using the crawler tool or the social network’s developer special API, online customer review data is collected(as is shown in Figs. 5 and 6). Online reviews from different social media are integrated, which finally can lead to rich data source for eWOM. To some extent, it can solve the problem of insufficient reviews for new products.
5.2.1. Extract appraisal relationship and determine the sentiment polarity We extracted product features from the pre-processing reviews based on the constructed domain fuzzy sentiment ontology. We record the extracted appraisal relationship (product feature, opinion word) as 𝑠𝑎 ∶= (𝑠𝑖 , 𝑎𝑖 ) for candidate fine-grained sentiment analysis. We calculate the sentiment polarity of each extracted appraisal relationship(represented as 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦(𝑠𝑎)) according to the following formula: { 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦(𝑠𝑎) =
𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦𝐷𝑜𝑚𝑎𝑖𝑛𝐿𝑒𝑥𝑖𝑐𝑜𝑛 (𝑠𝑖 ) 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦𝐹 𝑢𝑧𝑧𝑦𝑂𝑛𝑡𝑜𝑙𝑜𝑔𝑦 (𝑠𝑖 )
5.3.2. eWOM exploration for fine-grained level Sentiment analysis for a single online customer review. In this study, we calculate the sentiment score by the appraisal relationship supervised by fuzzy product ontology. The final sentiment score of each product feature from a single review is determined by the average sentiment score of all the related appraisal relationship as is shown in Fig. 7, where 𝑑 is the collection of appraisal relationships found in a single online customer review with respect to product feature 𝑎. The total number is recorded as 𝑑𝑠𝑎 . ∑ 𝑠𝑎∈𝑑 𝑠𝑒𝑛𝑡𝑖𝑠𝑐𝑜𝑟𝑒(𝑠𝑎) 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦(𝑑) = (7) |𝑑𝑠𝑎 | | |
(5)
where 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦𝐹 𝑢𝑧𝑧𝑦𝑂𝑛𝑡𝑜𝑙𝑜𝑔𝑦 (𝑠𝑖 ) is the polarity of sentiment word in fuzzy product ontology and 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦𝐷𝑜𝑚𝑎𝑖𝑛𝐿𝑒𝑥𝑖𝑐𝑜𝑛 (𝑠𝑖 ) is the polarity of sentiment word in the domain lexicon. We first use the fuzzy domain sentiment ontology to determine the polarity for each identified sentiment–aspect pair 𝑠𝑎 ∶= (𝑠𝑖 , 𝑎𝑖 ). If the polarity of a sentiment cannot be resolved by using the product ontology, then the domain sentiment lexicons is invoked to estimate the context-free sentiment polarity. In other words, sentiment polarity score for a pair 𝑠𝑎 is defined by 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦(𝑠𝑎) = 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦𝐹 𝑢𝑧𝑧𝑦𝑂𝑛𝑡𝑜𝑙𝑜𝑔𝑦 (𝑠𝑖 ) if 𝑠𝑎 is found in a product ontology; otherwise, the polarity of sa is defined by 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦(𝑠𝑎) = 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦𝐷𝑜𝑚𝑎𝑖𝑛𝐿𝑒𝑥𝑖𝑐𝑜𝑛 (𝑠𝑖 ). If the polarity of 𝑠𝑎 cannot be resolved by trying all the sentiment lexicons, a neutral polarity will be assigned.
eWOM exploration. Supposing review collection for product 𝑝𝑖 is recorded as 𝐷. The appraisal relationship set 𝑆𝐴 contains all the relationship extracted from 𝐷. ∑ 𝑠𝑎∈𝑆𝐴 𝑠𝑒𝑛𝑡𝑖𝑠𝑐𝑜𝑟𝑒(𝑠𝑎) 𝑎𝑠𝑝(𝑝𝑖 , 𝑎𝑖 ) = (8) |𝑆𝐴|
5.2.2. Sentiment strength calculation In this paper, we calculate the sentiment strength through mining the adverbs of different intensity in each sliding window. We first construct adverb thesaurus based on ‘‘adverbs of different intensity (in Chinese)’’ provided by HowNet.6 Integrating several former researches in the field of Chinese text processing, we classify adverbs into 6 different intensity levels and give them weight parameters as shown in Table 4. Then, the sentiment score for each sliding window could be calculated as: 𝑠𝑒𝑛𝑡𝑖𝑠𝑐𝑜𝑟𝑒(𝑠𝑎) = 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦(𝑠𝑎) ∗ 𝑘
where 𝑎𝑠𝑝(𝑝𝑖 , 𝑎𝑖 ) is the sentiment score of feature 𝑎𝑖 across the whole collection. The fuzzy mobile can guide the product sentiment analysis, which we use in our system to provide eWOM for products on different feature-levels. The consumers can query the sentiment score on different requirements. The results of sentiment analysis on fine-grained level are shown in Table 5, from which we select all the high-quality online reviews across the jd.com platform one month after the mobile came into the market. After clustering candidate product features into feature sets, we use radar charts to show the results. Each feature set is considered as a dimension in the map. To build a perceptual map, we select the top ten popular feature sets. Next, we use the sentiment scores of these ten feature sets for the selected product to draw a perceptual map. The strength and weakness of each product on the perceptual map can be easily obtained from large quantities of mobile online reviews. Meanwhile, we can easily get the subclass features and the corresponding sentiment strength in each feature set, which is also shown in Fig. 8. The resultant maps or charts may help gain insights into business initiatives in reviewing the performance of their products with those of their competitors.
(6)
where k is the weight parameter of sentiment strength calculated by the corresponding adverbs’ intensity that has been found in the adverb thesaurus. 𝑝𝑜𝑙𝑎𝑟𝑖𝑡𝑦(𝑠𝑎) is the polarity of the appraisal relationship calculated by Eq. (4). For the review text ‘‘The mobile phone is for my mother, all kinds of complete function, system is pure, also, the price is very appropriate, hope Mom enjoy it!’’, the result of sentiment score calculation is {‘‘system’’: 1, ‘‘function: 1’’,‘‘price’’: 1.4} accordingly. 5.3. An eWOM exploring system on mobile for case study We apply the fuzzy product ontology to fine-grained level sentiment analysis and develop an eWOM exploring system for mobile products. The designed system is described in Fig. 4. 6
5.4. System evaluation 5.4.1. Dataset description For our last experiment, we evaluate the product feature-based sentiment analysis algorithm proposed, then make an assessment on
http://www.keenage.com/. 74
Q. Sun, J. Niu, Z. Yao et al.
Engineering Applications of Artificial Intelligence 81 (2019) 68–78
Fig. 4. Architecture of eWOM exploring system.
Fig. 5. Data collection module by brand.
Fig. 6. Data collection module by product type.
Fig. 7. Sentiment analysis for a single online review.
75
Q. Sun, J. Niu, Z. Yao et al.
Engineering Applications of Artificial Intelligence 81 (2019) 68–78
Fig. 8. eWOM exploration for appointed product. Table 5 Dataset statistics.
Feature occurrence Feature umber
Positive
Negative
Total
2886 142
1585 123
4471 176
the performance of the eWOM exploring system developed. Since a benchmark data set is not available, we construct our own evaluation data set composed of randomly selected 5000 mobile customer reviews. There are a total of 2582 product feature related online reviews. We invite two human experts with over five years’ experience of writing consumer reviews to annotate these reviews. Only when both annotators agree on the specific product and associated sentiment polarity, would these features and sentiments be included in our evaluation data set. Among all the extracted reviews, product features occur 4471 times covering 176 mobile features, with 2886 being positive sentiment and 1585 negative. The results are shown in Table 5 Fig. 9. Evaluation on product features mining results by different methods.
5.4.2. The evaluation of feature mining supervised by fuzzy product ontology For the random-selected 2782 feature-related mobile reviews, the most frequently discussed top-level features are appearance, battery, screen, system and network (see Table 6). In the prior study, product features and sentiment polarity have been effectively exploited from online reviews by using the approach of Association rule or CRFs. We select these two methods as the baseline to evaluate the results of feature mining through two indicators. Type M represents the occurrence of features that the method fail to identify. Type B represents the occurrence of features that the method has wrongly evaluate. Experiment results for the three methods Association Rules, CRFs and Fuzzy Product Ontology are 682, 615, 475 respectively for Type M. Meanwhile, feature occurrences that are wrongly evaluated by the three methods are 215, 202, 187. As is seen from Fig. 9, the proposed fuzzy ontology based approach outperforms both the two baseline approaches in the mobile customer review dataset.
The performance of the proposed method and other baseline methods on the dataset is shown in Table 7. The experimental results indicate that fine-grained level sentiment analysis supervised by the constructed fuzzy product ontology is satisfactory as expected. It is shown that the precision, recall, accuracy and F1 scores of our method outperforms the other baseline methods. The OBPRM method outperforms slightly better than associated rules method in the dataset. Compared to CRFs that has achieved best performance among the baseline methods, our application of the constructed fuzzy ontology for fine-grained sentiment analysis shows an increase of 3.01%, 7.61%, 6.80%, 4.92% for P, ACC, R, F1 respectively. The experiment results confirm that the proposed computational methods for fuzzy product ontology mining and fine-grained level sentiment analysis supervised by the fuzzy product ontology are effective. 6. Conclusions and future work
5.4.3. The evaluation of feature-based sentiment analysis supervised by fuzzy product ontology The results of fine-grained sentiment analysis are evaluated by Precision(P), Accuracy(ACC), Recall(R) and F-score(F1). We firstly develop an Association Rule mining based baseline system to conduct product feature-based sentiment analysis. Then, we utilized the OBPRM method proposed by Lau et al. (2014) to perform the same fine-grained level sentiment analysis task. Besides, we also use a model based on CRFs in Jin’s research (Jin et al., 2016) as a baseline.
This research extracts various aspects of product features and correlated sentiment, and further explore eWOM for products at a finegrained level in determining the sentiment polarity of expressions of the online customer reviews. Firstly, we design a novel social analytics method for automatic fuzzy product ontology construction. An improved Apriori algorithm is used to pinpoint different aspects of product features. Sliding window method and word divergence measurement are used to extract appraisal relationships as well as the feature 76
Q. Sun, J. Niu, Z. Yao et al.
Engineering Applications of Artificial Intelligence 81 (2019) 68–78 Table 6 Most frequently discussed top-level features of the random selected reviews dataset.
Table 7 Dataset statistics Benchmark test for fine-grained level sentiment analysis.
Association rule OBPRM CRFs Fuzzy product ontology
P
ACC
R
F1
68.24% 72.25% 83.44% 86.45%
66.18% 71.40% 76.14% 83.75%
73.33% 78.01% 81.40% 88.20%
70.69% 75.02% 82.40% 87.32%
Ghiassi, M., Skinner, J., Zimbra, D., 2013. Twitter brand sentiment analysis: A hybrid system using n -gram analysis and dynamic artificial neural network. Expert Syst. Appl. 40 (16), 6266–6282. Hatzivassiloglou, V., Mckeown, K.R., 1997. Predicting the semantic orientation of adjectives. In: Eighth Conference on European Chapter of the Association for Computational Linguistics, pp. 174–181. Hu, M., Liu, B., 2004. Mining and summarizing customer reviews. In: Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, August, pp. 168–177. Jin, J., Ji, P., Kwong, C.K., 2016. What makes consumers unsatisfied with your products: Review analysis at a fine-grained level. Eng. Appl. Artif. Intell. 47, 38–48. Jo, Y., Oh, A.H., 2011. Aspect and sentiment unification model for online review analysis. In: ACM International Conference on Web Search and Data Mining, pp. 815–824. Kaji, N., Kitsuregawa, M., 2007. Building lexicon for sentiment analysis from massive collection of HTML documents. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Kanayama, H., Nasukawa, T., 2006. Fully automatic lexicon expansion for domainoriented sentiment analysis. In: Conference on Empirical Methods in Natural Language Processing, pp. 355–363. Kang, Y., Zhou, L., 2016. Rube: Rule-based methods for extracting product features from online consumer reviews. Inf. Manage. Kobayashi, N., Inui, K., Matsumoto, Y., Tateishi, K., Fukushima, T., 2004. Collecting evaluative expressions for opinion extraction. In: International Joint Conference on Natural Language Processing, pp. 596–605. Lau, R.Y.K., Li, C., Liao, S.S.Y., 2014. Social analytics: Learning fuzzy product ontologies for aspect-oriented sentiment analysis. Decis. Support Syst. 65 (5), 80–94. Lau, R.Y.K., Song, D., Yuefeng, L.I., Cheung, T.C.H., Hao, J., 2009. Toward a Fuzzy domain ontology extraction method for adaptive e-learning. IEEE Trans. Knowl. Data Eng. 21 (6), 800–813. Liu, B., Zhang, L., 2012. A Survey of Opinion Mining and Sentiment Analysis. Springer US, pp. 459–526. Mao, K., Niu, J., Wang, X., Wang, L., 2015. Cross-domain sentiment analysis of product reviews by combining lexicon-based and learn-based techniques. In: IEEE International Conference on High PERFORMANCE Computing and Communications, pp. 351–356. Miao, Q., Li, Q., Zeng, D., 2010. Fine-Grained Opinion Mining by Integrating Multiple Review Sources. John Wiley and Sons, Inc., pp. 2288–2299. Mukherjee, A., Liu, B., 2012. Aspect extraction through semi-supervised modeling. In: Meeting of the Association for Computational Linguistics: Long Papers, pp. 339–348. Saif, H., He, Y., Alani, H., 2012. Semantic Sentiment Analysis of Twitter. Springer, Berlin, Heidelberg, pp. 508–524. Sun, Q., Niu, J., Yao, Z., Qiu, D., 2016. Research on semantic orientation classification of Chinese online product reviews based on multi-aspect sentiment analysis. In: Ieee/Acm International Conference on Big Data Computing, Applications and Technologies, pp. 262–267. Tan, K.L., Hong, J.L., Tan, E.X., 2012. A novel ontological technique for sentiment analysis. In: International Conference on Neural Information Processing, pp. 339–346. Tan, S., Wang, Y., Cheng, X., 2008. 8Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, 743–744. Thakor, P., Sasi, S., 2015. Ontology-based sentiment analysis process for social media content. Procedia Comput. Sci. 53 (Database issue), 199–207. Titov, Ivan, McDonald, Ryan, 2008. A joint model of text and aspect ratings for sentiment summarization. In: PROC. ACL-08: HLT, pp. 308–316. Turney, P.D., Littman, M.L., 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst. (TOIS) 21 (4), 315–346. Wang, J.H., Lee, C.C., 2012. Unsupervised opinion phrase extraction and rating in Chinese blog posts. In: IEEE Third International Conference on Privacy, Security, Risk and Trust, pp. 820–823. Wang, H., Lu, Y., Zhai, C., 2010. Latent aspect rating analysis on review text data: a rating regression approach. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 783–792.
related sentiments. Then, based on automatically constructed product ontology, a fine-grained-level sentiment analysis computational method supervised by fuzzy product ontology is proposed to group jointly both aspects of features and the corresponding sentiments for customers. The proposed method can accurately predict the polarities of featurelevel sentiments without requiring labor-consuming manual labeling of training examples. Furthermore, an eWOM exploration system has been developed, who can perform deep analytics for the target product, using customer reviews posted to different social media. It can be implied as an IT agent that mines opinions from online customer reviews at a fine-grained level. Benchmark evaluation on the real customer data set suggests that we achieve better performance by applying the proposed computational method in the eWOM exploration system. The business implications of our research is that by sentiment analysis in a fine-grained level, this research enables designers to extract market intelligence from a large volume of opinion data effectively and efficiently. With the proposed approach, many dedicated applications are expected to be developed, evaluated and applied in real scenarios of product design to alleviate the burden of summarizing customers’ opinions from a large volume of online customer review data. As a result, designers can develop effective business strategies related to marketing, customer relationship management, and product design in a timely manner. Additionally, this study facilitates the consumers’ shopping decision making. More comprehensive evaluations are supported to make comparison of product on different features. However, this research has some limitations. Future work may focus on the continuous enhancement of positive and negative opinion words extraction and polarity computation. Meanwhile, the direct methods for comparing the performance of the fine-grained sentiment analysis with that of other state-of-the-art sentiment analysis systems will be conducted in the future. Acknowledgment This paper is supported by Nature Science Foundation of China (NSFC, Grant Number: 71671011). References Amplayo, R.K., Song, M., 2017. An adaptable fine-grained sentiment analysis for summarization of multiple short online reviews. Data Knowl. Eng.. Anoop, V.S., Asharaf, S., Deepak, P., 2016. Unsupervised concept hierarchy learning: A topic modeling guided approach. Procedia Comput. Sci. 89, 386–394. Chi, Y.L., 2007. Elicitation synergy of extracting conceptual tags and hierarchies in textual document. Expert Syst. Appl. 32 (2), 349–357. CNNIC, 2015. The research report on online shopping market in China. http://www. cnnic.cn/hlwfzyj/hlwxzbg/dzswbg/. 77
Q. Sun, J. Niu, Z. Yao et al.
Engineering Applications of Artificial Intelligence 81 (2019) 68–78 Zhao, W.X., Jiang, J., Yan, H., Li, X., 2010. Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid. In: Conference on Empirical Methods in Natural Language Processing, pp. 56–65. Zhou, L., Chaovalit, P., 2008. Ontology-supported polarity mining. J. Am. Soc. Inf. Sci. Technol. 59 (1), 98110.
Wei, W., Gulla, J.A., 2010. Sentiment learning on product reviews via sentiment ontology tree. In: ACL 2010, Proceedings of the Meeting of the Association for Computational Linguistics, July 11–16, 2010, Uppsala, Sweden, pp. 404–413. Zadeh, L.A., 1965. Fuzzy sets, information and control. Inf. Control 8 (3), 338–353. Zhang, W., Xu, H., Wan, W., 2012. Weakness finder: Find product weakness from chinese reviews by using aspects based sentiment analysis. Expert Syst. Appl. 39 (11), 10283–10291.
78