Journal Pre-proof
ReMemNN: A novel memory neural network for powerful interaction in Aspect-based Sentiment Analysis Ning Liu , Bo Shen PII: DOI: Reference:
S0925-2312(20)30193-4 https://doi.org/10.1016/j.neucom.2020.02.018 NEUCOM 21886
To appear in:
Neurocomputing
Received date: Revised date: Accepted date:
25 July 2019 15 January 2020 3 February 2020
Please cite this article as: Ning Liu , Bo Shen , ReMemNN: A novel memory neural network for powerful interaction in Aspect-based Sentiment Analysis, Neurocomputing (2020), doi: https://doi.org/10.1016/j.neucom.2020.02.018
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2020 Published by Elsevier B.V.
ReMemNN: A novel memory neural network for powerful interaction in Aspect-based Sentiment Analysis
Ning Liu1,2, Bo Shen1,2* 1
School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044,
China 2
Key Laboratory of Communication and Information Systems, Beijing Municipal Commission of
Education, Beijing, China * Corresponding author {17111016, bshen}@bjtu.edu.cn
Abstract Deep neural networks have been employed to analyze the sentiment of text sequences and achieved significant effect. However, these models still face the issues of weakness of pre-trained word embeddings and weak interaction between the specific aspect and the context in attention mechanism. The pre-trained word embeddings lack the specific semantic information from the context. The weak interaction results in poor attention weights and produces limited aspect dependent sentiment representation in aspect-based sentiment analysis (ABSA). In this paper, we propose a novel end-to-end memory neural network, termed Recurrent Memory Neural Network (ReMemNN), to mitigate the above-mentioned problems. In ReMemNN, to tackle weakness of pre-trained word embeddings, a specially module named embedding adjustment learning module is designed to transfer the pre-trained word embeddings into adjustment word embeddings. To tackle weak interaction in attention mechanism, a multielement attention mechanism is designed to generate powerful attention weights and more precise aspect dependent sentiment representation. Besides, an explicit memory module is designed to store these different representations and generate hidden states and representations. Extensive experimental results on all datasets show that ReMemNN outperforms typical baselines and achieve the state-of-the-art performance. Besides, these experimental results also demonstrate that ReMemNN is language-independent and dataset type-independent. Keywords: Aspect-based sentiment analysis; Natural language processing; Attention mechanism; Deep learning
1. Introduction Sentiment analysis, one of applications of natural language processing (NLP), has received widespread attention in NLP community [1-3]. One reason is that sentiment analysis is the basis of machine to understand human language deeply, the other is that sentiment analysis can enhance the effect of question and answer system
efficiently. Further, sentiment analysis is also regarded as one of key techniques to achieve strong artificial intelligence. Except in NLP community, industrial business companies also pay more and more attention to fine-grained sentiment analysis which is termed as aspect-based sentiment analysis (ABSA) [4]. Because these industrial business companies can be aware of public praise of their product or service by means of ABSA. Hence, they can improve their product or service according to the online comments of internet platform. Another benefit for industrial business companies is that ABSA can save manpower, budget and time. They usually do product research by telephone counseling and questionnaire survey in the past. There are many methods proposed in literature to analyze the sentiment contained in text sequences. Some researchers take advantage of part-of-speech, parsing tree and sentiment dictionary to determine the sentiment polarity of the documents or sentences, which belongs to computational semantics. Such methods have good performance in most cases. However, these methods lack the abilities of learning and generalization, which means that it is usually language-specific or domain related. Conventional machine learning methods are also employed to identify the sentiment, which involve unsupervised methods such as clustering algorithms or latent dirichlet allocation (LDA) and supervised methods like support vector machine (SVM) or decision tree. Supervised machine learning methods require expert knowledge to design features, named as feature engineering, which is usually labor intensive and costs a lot of time. Deep learning methods solve these problems by means of learning related features by itself. They also have better generalization ability and are more scalable than traditional machine learning method. Lots of neural architectures have been proposed, such as convolutional neural network (CNN) [5], recurrent neural network (RNN) [6], recursive neural network [7] and composite structures based on them, among which the long short-term memory (LSTM) [8, 9] network continues to be a powerful way in abundant NLP tasks. Now, end-to-end LSTM-based networks achieve the state-of-the-art performance in aspect-based sentiment analysis. However, this kind of present methods do not consider weakness of pre-trained word embeddings such as lacking of specific semantic information from the context. For example, in a review “the phone is light”, “light” means positive polarity towards the phone, however, in a review “the car is too light” may means negative polarity towards the car. The general embeddings are fixed and don‟t apply in this case. These deficiencies will result in limited performance in the conditional of specific domain vocabularies [10-12]. The reason is that the end-to-end LSTM-based networks directly employ pre-trained word embeddings, which are usually trained in a large generic corpus and do not capture domain specific knowledge or specific context information. Besides, these models ignore the role of each hop‟s inner aspect dependent sentiment representation, which leads to insufficiently utilizing abundant semantic information of each hop. They just regard inner aspect dependent sentiment representation of the last hop as final sentiment feature to decide the sentiment polarity of the aspect target. Also, they use binary interaction between the aspect term and the context, which leads to limited
attention value and less informative aspect dependent sentiment representation. For example, in a review “Not only was the food outstanding, but the little 'perks' were great”, binary interaction mechanism only uses “food” and its context words to generate attention weights towards aspect term “food”. However, by this means, the binary interaction mechanism may focus on some irrelevant words such as “was” and “little” and generate less informative aspect dependent sentiment representation. Moreover, the methods based on LSTM will take a long time in the training, because they need complex calculations along the text sequence. In this paper, we propose a novel memory neural network to perform the ABSA task, named as Recurrent Memory Neural Network (ReMemNN). In ReMemNN, a specially module named embedding adjustment learning module is designed to tackle the problem of lacking of the specific semantic information from the context occurring in pre-trained word embeddings. To deal with weak interaction of binary attention, a multielement attention mechanism is designed to generate powerful attention values and informative inner aspect dependent sentiment representation. To make full advantage of these different aspect dependent sentiment representations and generate more expressive final sentiment representation, the explicit memory module is designed to store these inner representations and generate two types of representation. The first output information is hidden state, which is used to learn inner aspect dependent sentiment representation. The second information is output representation, which is used to decide the sentiment polarity of aspect target. The main contributions from this research are the following.
We propose a novel memory neural network, named Recurrent Memory Neural Network (ReMemNN) to perform the ABSA task. We propose an embedding adjustment learning module to overcome weakness of pre-trained word embeddings. We propose a multielement attention mechanism to tackle the weak interaction in binary attention mechanism. We proposed an explicit memory module to store inner different representations and generate hidden states and more expressive representations. Abundant experiments were done on three English and four Chinese datasets from different sources and domains. ReMemNN outperforms typical baselines and achieve state-of-the-art performance. Further, these results also demonstrate that ReMemNN is language-independent and dataset type-independent.
2. Related Work Sentiment analysis has aroused widespread interest not only in academic research but also in industrial commercial corporation, the applied range of sentiment analysis can cover diverse domains such as finance, politics and businesses, health science, etc. [13-15]. Sentiment analysis can be classified into three categories such as document-level, sentence-level and aspect-level according to different research granularity. In general, document-level sentiment analysis focuses on the whole
document polarity, which has an implicit assumption that the whole document expresses only one polarity. Sentence-level sentiment analysis assumes that the sentence expresses affection toward to only one entity or aspect and different sentence can have different sentiment polarity in the document. There is no such assumption in aspect-level sentiment analysis. Aspect-level sentiment analysis aims at detecting the polarity towards different aspects (or attributes) of an entity in a sentence. In other words, sentence can have one or more aspects in aspect-level sentient analysis. In reality, people may pay more attention to one or more attributes of the product, not the overall of the product in practice [16-18]. In our work, we are concerned with the aspect-level sentiment analysis. There are also some literature reviews of sentiment analysis. A good survey and introduction into the field of sentiment analysis research in Chinese language is Peng, Cambria and Hussain‟s research from 2017 [19]. They discussed the characters of Chinese words and provided comprehensive research of Chinese sentiment analysis research from both monolingual and multilingual perspectives. Schouten et al. [20] focused on the field of aspect-based sentiment analysis. They not only discussed various evaluation measures and techniques, but also presented related and complicated issues. There are many challenges in sentiment analysis. One challenge is how to effectively represent each word with the vector, which can not only capture semantic information but also contain emotive information. In general, word is viewed as the atomic symbol. The quality of word representation can have a significant impact in sentiment analysis. Human language is recursive and have the characteristic of semantic compositionality. Words can be combined to generate one token or phrase, while the combinative phrase or token may have completely different meaning from the sum of original word, such as “kick the bucket” and “shoot the breeze”. Thus, how to combine these words to acquire the high-quality sentence or phrase representation remains a challenge. Another challenge is the domain adaptation. Because words in different domains may have quite different sentiment polarity, for example, the word „light‟ is positive in the domain of electronic products, however it is negative in automotive domain. How to process powerful interaction between each part to get powerful attention weights and expressive representations is a major problem especially in ABSA tasks. Tang et al. [17] proposed target-dependent LSTM (TD-LSTM) to deal with target-dependent sentiment classification. They take account into the target information to build the final representation by training a bidirectional LSTM network. However, TD-LSTM ignored that each word in context words plays a different role towards the aspect word. Wang et al. [21] developed attention-based LSTM (AT-LSTM) and attention-based LSTM with aspect embedding (ATAE-LSTM) and solved the problem. These two models are inspired by the attention effect in natural language inference [22]. There are no doubts that the former model is outperformed by the latter model, because AEAE-LSTM consider more semantic information, which is provided by concatenating context words and aspect terms. Ma et al. [23] explored the interaction between aspect and context words. They proposed the interactive attention networks (IAN) model which learns target and context words
representations separately and concatenates them as the final representation passed into the softmax layer. When the aspect consists of multiple words, simply adding embeddings of them up as the representation of the aspect may result in wrong meaning. Li et al. [24] proposed target-specific transformation network (TNet) to tackle the problem by modeling the relationship between the context word and each word within the aspect. Though TNet and ReMemNN are multi-layer architectures, the context-preserving mechanism could not make full use of the hidden state of each layer in transformation architecture, which may loss some important information when generating the final sentence representation. However, the memory module can store and use all hops‟ hidden states to generate the final sentence representation in ReMemNN. Besides, TNet employed CNN to overcome the weakness of simply attention when capturing local features. ReMemNN proposed multielement attention mechanism to overcome the weakness of simply attention and generate powerful attention weights. There are also some unsupervised machine learning methods to perform ABSA tasks. Yohan et al. [25] proposed the sentence-LDA (SLDA) and aspect and sentiment unification model (ASUM) which learns to automatically discover aspects and the polarity of the aspect. Fu et al. [26] used topic model and HowNet lexicon to perform multi-aspect sentiment analysis. Zhen et al. [27] proposed the supervised joint aspect and sentiment model (SJASM) which is a supervised method to model aspect terms, corresponding polarity and overall sentiments of reviews. As for Memory networks, they are proposed to deal with question answering and met with success in question answering [28, 29]. For ABSA tasks, Memory networks view the aspect as the query and regard the context words as the clues to determine the polarity of aspect term. In other words, Memory networks have the memory selection operation so as to extract the important information about the aspect. Tang et al. [30] first develops a deep memory network for aspect-level sentiment analysis (we call this model MemNN) by considering not only the content attention but also the location attention. Li et al. [31] proposed AttNet model consisting of two separate subtasks: target detection and polarity classification. However, these two subtasks provide clues for each other. Yi et al. [32] proposed dyadic memory networks (DyMemNN), taking account into the dyadic interactions between aspect and context words to generate attention weights. The model has two important components: tensor module and holographic module. They also developed the aspect fusion LSTM (AF-LSTM) where they applied the holographic module into LSTM and got the competitive results [33]. Zhang et al. [34] proposed another dynamic memory network, consisting of input module, question module, memory module and answer module. Their method can be seen as modeling ABSA into question answering system with dynamic memory networks (DMN), which was proposed to model the correlative dependent information between questions and texts [35]. Wang et al. [36] proposed target-sensitive memory networks (TMNs) to address the target-sensitive sentiment problem, which means that the sentiment polarity of the opinion word is conditioned on the aspect and cannot be directly learned from the opinion word alone. They introduced six alternative methods to deal with the problem in TMNs. In
ReMemNN, we learn the interplay between an aspect, its context, and memory state by multielement attention mechanism, which can learn better attention. Besides, our multi-layered architectures can learn different level features, which are helpful in predicting the sentiment polarity of the specific aspect in a sentence. Recently, Yang et al. [37] proposed a segment-level joint topic-sentiment model (STSM) to model sentiments and topics. They assume that each segment of a sentence only contains one sentiment. Based on this assumption, they divide each sentence into multiple segments and model the correlation between topics and sentiments. Sometimes, the sentiment is implicit, such as sarcasm where intensive emotion is expressed. Majumder et al. [38] proposed a multitask learning-based framework to handle both sentiment classification and sarcasm detection tasks. Their experimental results show that sentiment classification and sarcasm detection can improve each other by multitask learning. Besides, language model, such as bidirectional encoder representations from transformer (BERT) [39], has achieved great success across a variety of NLP tasks. For exploring BERT in the task of ABSA, sun et al. [40] proposed the method of viewing ABSA as a sentence-pair classification task. They presented four ways to construct additional auxiliary tasks. There is also one task related to ABSA, which is aspect term extraction (ATE). Aspect terms extraction aims at extracting the aspects (or attributes) of an entity upon which opinions have been expressed. Luo et al. [41] proposed dual cross-shared RNN (DOER) for the task of aspect term-polarity co-extraction. DOER mainly consists of two stacked RNN and cross-shared unit which interact between two tasks by attention mechanism. DOER is a joint sequence labeling method, which learns to extract the aspect and predict sentiment polarity simultaneously. They found that the way of joint learning can obtain better performance than pipeline or collapsed way. However, sequence labeling methods based neural networks are not good at capturing the overall meaning of a sentence and processing label dependencies. Ma et al. [42] formalized ATE as a sequence-to-sequence (Seq2Seq) task and directly utilize the previous label to tackle these problems. Besides, they designed position-aware attention to consider the position information in widely-used attention mechanism. They also proposed gated unit networks to integrate information from encoder hidden state and decoder hidden state. To improve the performance of sentiment analysis, some work focuses on word representations for sentiment analysis. Li et al. [43] explored the way how to exploit prior knowledge for sentiment analysis, such as sentiment lexicons and sentiment labels of documents from available datasets. They proposed a novel framework that incorporates various types of prior knowledge into the word representations. Xiong et al. [44] proposed a novel framework, termed multi-level sentiment-enriched word embedding (MSWE), to learn specific word embedding for sentiment analysis by exploiting sentiment lexicons and distant supervised information. They developed a hybrid multilayer perception and CNN for learning the word-level and sentence-level sentiment information simultaneously. 3. Problem Definition and Notation
The major problems that we plan to solve are the lacking of the specific semantic information from the context in pre-trained word embeddings and weak interaction in binary attention mechanism. The weak interaction between aspect target and context will result in poor attention weights and limited aspect dependent sentiment representation in ABSA tasks. Given a sentence 𝑠 = *𝑤1 , 𝑤2 , 𝑤𝑎 , … , 𝑤𝑛 +, which consists of n words and aspect word 𝑤𝑎 , ABSA task aims to classify the sentiment polarity of aspect term 𝑤𝑎 in the specific sentence s. The sentiment polarity of aspect term is divided into three classes: positive, negative and neutral. For example, in Figure 1, the sentiment polarity of sentence “Not only was the food outstanding, but the little 'perks' were great.” towards aspect term “food” is positive, and the sentiment polarity of “perks” is also positive. We use pre-trained word embeddings Glove [45] to map each word in sentence s into a low dimensional, continuous and real-value vector. We define a word embedding matrix 𝐸 ∈ ℝ𝑑×|𝑉| , where d is the dimension of word embeddings and |V| is vocabulary size.
Figure 1: Example: restaurant review with aspects
4. Recurrent Memory Neural Network ReMemNN mainly consists of embedding adjustment learning module, multielement attention module and explicit memory module. We will discuss each part separately in the next section. The overall model architecture is illustrated in Figure 2.
Figure 2: Diagram of our proposed ReMemNN architecture. This diagram illustrates t hop iteration.
4.1. Embedding Adjustment Learning Module In the embedding adjustment learning module, two sequence encoders are designed to transfer pre-trained word embeddings to adjustment word embeddings. The embedding adjustment learning module accepts the context words and the aspect term as inputs. Firstly, we map the context words and aspect term from high dimensional discrete space into low dimensional dense space by looking up an Embedding matrix E, we get context word embeddings matrix EA and aspect term embedding asp, where 𝐸 𝐴 ∈ ℝ𝑑×𝑛 , 𝑎𝑠𝑝 ∈ ℝ𝑑×1 , n is length of the context, d is the dimension of pre-trained word embeddings. When an aspect target consists of multiple words such as “battery life”, we take the average of the word embedding of each word within the specific aspect target as the word embedding of the specific aspect target. A simple non-liner mapping function (we can also use more complicated function or model to get more precise embeddings, such as LSTM [8, 9] or GRU [46] as the sequence encoder), which is hyperbolic tangent function, is applied to obtain the context adjustment word embeddings: 𝐼𝐴 = tanh(𝑊𝐴 ⋅ 𝐸 𝐴 ) (1) 𝑘×𝑑 Where 𝑊𝐴 ∈ ℝ is the parameter matrix to adapt context pre-trained embedding 𝐸 𝐴 , k is a hyperparameter and the value of k can be any value. Similarity, we use
hyperbolic tangent function to produce the aspect adjustment word embedding: 𝐼𝑎𝑠 = tanh(𝑊𝑎𝑠 ⋅ 𝑎𝑠𝑝) (2) 𝑘×𝑑 where 𝑊𝑎𝑠 ∈ ℝ is the parameter matrix to adapt aspect pre-trained embedding asp. The dimension of 𝑊𝐴 and 𝑊𝑎𝑠 is the same. 4.2. Multielement Attention Module When talking about a certain aspect term, different context words play different roles in a sentence. For example, in a sentence “Not only was the food outstanding, but the little 'perks' were great.” The context word “outstanding” contributes more to the other words for aspect “food”. Besides, the context word “great” contributes more to the other words for aspect “perk”. The multielement attention module accepts hidden state of explicit memory module (discussed in session 4.3), context adjustment word embeddings 𝐼𝐴 ∈ ℝ𝑘×𝑛 and aspect adjustment word embedding 𝐼𝑎𝑠 ∈ ℝ𝑘×1 . The output of multielement attention mechanism is the inner aspect dependent sentiment representation ℎ ∈ ℝ𝑑×1 . The inner aspect dependent sentiment representation is computed as a weighted sum of each word representation of context word embeddings EA, the design formulas is as follows: ℎ = 𝐸 𝐴 ⋅ 𝑉𝑎𝑡𝑡 (3) 𝑛×1 where 𝑉𝑎𝑡𝑡 ∈ ℝ is the attention vector. Also, we can represent 𝑉𝑎𝑡𝑡 as another form, 𝑉𝑎𝑡𝑡 = ,𝑎1 , 𝑎2 , … , 𝑎𝑝 , … , 𝑎𝑛−1 , 𝑎𝑛 -𝑇 , where 𝑎𝑝 ∈ ,0,1- is the weight of context words and ∑𝑛1 𝑎𝑝 = 1. We employ a feed forward neural network to compute the attention value, which is the emotional relevance about aspect term and context words. Suppose current hop is t, the unnormalized attention value is calculated as follow: 𝑔𝑡 = tanh(𝑊𝑎𝑡𝑡 ⋅ ,𝐼𝑡𝑎𝑠 ; 𝐼𝑡𝐴 ; ℎ𝑡−1 - + 𝑏𝑎𝑡𝑡 ) (4) 𝐴 𝑎𝑠 1×(2𝑘+𝑑) 𝑘×1 𝑘×1 𝑑×1 1×1 where 𝑊𝑎𝑡𝑡 ∈ ℝ , 𝐼𝑡 ∈ ℝ , 𝐼𝑡 ∈ ℝ , ℎ𝑡−1 ∈ ℝ and 𝑏𝑎𝑡𝑡 ∈ ℝ , 𝐴 𝐼𝑡 is the context adjustment word embeddings at hop t, 𝐼𝑡𝑎𝑠 is the aspect adjustment word embedding at hop t, ℎ𝑡−1 is the inner aspect dependent sentiment representation at hop t-1. After we compute all of *𝑔1 , 𝑔1 , … , 𝑔𝑛−1 , 𝑔𝑛 +, we feed them into a SoftMax function to get the normalized attention values *𝑎1 , 𝑎2 , … , 𝑎𝑝 , … , 𝑎𝑛−1 , 𝑎𝑛 + , its formula is given as follow: exp (𝑔𝑝 )
𝑎𝑝 = softmax(𝑔𝑝 ) = ∑𝑛
𝑗=1 exp (𝑔𝑗 )
(5)
The key advantage of multielement attention module is that it produces the precise attention weights and more expressive inner aspect dependent sentiment representation. Because this multielement attention mechanism can capture informative sentiment information and more precise semantic relation between aspect term and the context by advantage of not only the semantic relation between aspect term and the context, but also the association relation between aspect and hidden state of explicit memory module, and the semantic relation between the context and hidden state of explicit memory module. There are some important differences of attention mechanism between MemNN [30] and our proposed model. Firstly, the attention mechanism in MemNN is binary
attention. Binary attention captures the binary interaction between context words and aspect term, which can lead to limited attention value and less informative aspect dependent sentiment representation for ABSA. Because binary attention only takes context words and aspect term as input. However, the attention mechanism in our proposed model is multielement attention mechanism. Multielement attention captures the multielement interaction between context words, aspect term and memory state. Multielement attention considers more information than binary attention in MemNN. Meanwhile, multielement attention can learn more powerful attention weights and generate more informative aspect dependent sentiment representation for ABSA. Secondly, in multiple-layered MemNN, one of the inputs of binary attention mechanism is the sum of attention output of previous layer and the aspect information. However, simply adding them up may result in confusing information, which will reduce the performance of the model. In multiple-layered ReMemNN, the multielement attention mechanism can directly exploit the aspect information and attention output of previous layer instead of using the way of adding them up. That is to say, multielement attention mechanism in our proposed model can capture more informative and precise information from attention output of each layer and aspect term. 4.3. Explicit Memory Module As discussed above, the explicit memory module is important component in our ReMemNN architecture. It is designed to store useful sentiment information and provide hidden state and output state to learn the inner aspect dependent sentiment representation and decide the sentiment polarity of aspect target separately. The inner aspect dependent sentiment representation of each hop is stored in the memory block M ∈ ℝ𝑑×(𝑡+1), where t is the number of iteration hop. The memory block vector is made of a serious of the inner aspect dependent sentiment representation ℎ. Each memory unit vector of memory block captures different levels of semantic information and sentiment information. The higher-level memory unit vector captures more abstract information such as semantic information and pragmatic information. The lower-level may capture some lexical information of the sentence. The hidden state of explicit memory module at hop t is ℎ𝑡−1 . The output of explicit memory module is given as follow: 𝑂𝑢𝑡𝑝𝑢𝑡𝑡 = ,ℎ1 ;⋅⋅⋅; ℎ𝑖 ;⋅⋅⋅; ℎ𝑡 (6) where ℎ𝑖 is the inner aspect dependent sentiment representation at hop i. The initial state of explicit memory module ℎ0 is initialized by the normal distribution which the mean is 0 and the standard deviation is 0.01. Our proposed model is a multilayer architecture, the number of iteration hops decides the number of layer and the size of explicit memory module. The number of iteration hops is a hyperparameter. The larger the number of iteration hops, the more output representations of multielement attention module are stored in the explicit memory module. Hence, the explicit memory module can store more informative information and improve model‟s performance. The word “explicit” in explicit memory module means that the attention output
of each layer is stored in the computer‟s memory. The memory state in explicit memory module is more like an array which can store different vectors of each layer in different indexes. In MemNN [30], there is no such “explicit” memory module. In fact, the embeddings of context words are viewed as the memory list in MemNN. The attention output of each layer is directly sent to next layer, which doesn‟t consider storing these attention output representations. Hence, the memory module in MemNN can be regarded as “implicit”. 4.4. SoftMax Layer The output of explicit memory module and the aspect word embedding which is obtained from the pre-trained embeddings is then fed into a SoftMax layer. 𝑦̂ = Softmax(𝑊𝑦 ⋅ ,Output; 𝑎𝑠𝑝-) (7) where 𝑊y ∈ ℝ𝑘×𝑙 is the parameter of the SoftMax layer, 𝑙 = (𝑡 + 1) × 𝑑, t is the number of iteration hop, d is the dimension of the pre-trained embeddings, 𝑘 is the number of labels. 4.5. Objective function and Learning We adopt the cross entropy as model‟s loss function. The model is trained in a supervised manner by minimizing the cross-entropy loss, the loss function is given as below: ℒ = − ∑𝑁 ̂𝑖 (8) 𝑖=1 𝑦𝑖 𝑙𝑜𝑔𝑦 where N is the number of all training sentences, 𝑦̂ is the predicted class of our proposed model, 𝑦 is the ground truth. In our experiment, we found the performance of the model showed no significant change when the L2 regularization is adopted. 5. Experiments In this section, we report our experimental results and give some reasons why our model can acquire great performance in all datasets. 5.1. Experimental Setting Table 1: Statistics of three English datasets and four Chinese datasets. Datasets Overall Positive Negative Neutral Train 3578 2148 800 630 Restaurant Test 1110 721 195 194 Train 2315 991 865 459 Laptop Test 637 341 128 168 Train 6257 1567 1563 3127 Twitter Test 694 174 174 346 Train 1635 1157 478 Camera Test 408 263 145 Train 885 668 217 Car Test 221 166 55 -
Notebook Phone
Train Test Train Test
485 121 1885 471
331 72 1260 303
154 49 625 168
-
In order to evaluate the performance of our proposed ReMemNN, we use two datasets obtained from SemEval 2014 [47], twitter dataset which is collected by Dong et al. [48] and four Chinese datasets from Peng et al. [49] which includes camera, car, notebook and phone domains. The restaurant dataset and laptop dataset in SemEval 2014 are consisted of the reviews, the corresponding aspect terms and polarities. We remove the data where the sentiment polarity of aspect target is “conflict” or the aspect target is “NULL” in both restaurant and laptop datasets. The statistics of all datasets are shown in Table 1. We use 300-dimension pre-trained Glove embedding [45] as initial word embeddings in three English datasets, as previous works did [30]. These English word embeddings will be fine turned in the training. In four Chinese datasets, we use ICTCLAS [50] to implement word segmentation. As we have limited resources and energy, we randomly initialize Chinese word embeddings with uniform distribution U (-0.01, 0.01). These Chinese word embeddings will be learned in the training. The pre-trained Glove embedding is viewed as the general-purpose embedding. For the domain-specific embeddings, we use Laptop embedding and Restaurant embedding [51] in three English datasets. These domain-specific embeddings will be fine turned in the training. Laptop embedding is trained from the Amazon review dataset and Restaurant embedding is trained from the Yelp review dataset. Both embeddings are trained by fastText [52]. The dimension of both domain-specific embeddings is 100. Laptop embedding is used as the domain-specific embedding of laptop dataset. Restaurant embedding is used as the domain-specific embedding of restaurant dataset. Because Restaurant embedding is large and contains more words, thus we use Restaurant embedding as the domain-specific embedding of twitter dataset. All variable parameters in model are randomized with uniform distribution U (-0.01, 0.01) in our experiments. The initial state of explicit memory module is initialized by normal distribution which the mean is 0 and the standard deviation is 0.01. Adagrad is adopted as the optimizer and the learning rate in experiment is 0.001. 5.2. Compared Models In our experiments, we conduct two types of baseline comparisons. The first type includes the self-variants of ReMemNN, they are used to verify the validity of each module of our proposed model. The second type includes the state-of-the-art methods in ABSA task, they are in order to examine the overall performance and verify remarkable performance of our proposed model. 5.2.1 Variants of ReMemNN Because there are three major modules of ReMemNN, namely embedding adjustment learning module, multielement attention module and explicit memory
module, we design different variants of ReMemNN for one of three modules to verify its superiority. ReMemNN-v1 is designed to test the embedding adjustment learning module. ReMemNN-v2 is designed to examine the multielement attention module. ReMemNN-v3 are designed to examine the explicit memory module. (1) ReMemNN-v1: The first variant of ReMemNN. It eliminates the embedding adjustment learning module. Instead, it directly takes the initial word embeddings as the input to multielement attention module. (2) ReMemNN-v2: The second variant of ReMemNN. It discards the multielement attention mechanism. It generates attention values by employing the binary attention interaction between the aspect adjustment word embedding and the context adjustment word embeddings, not considering the hidden state of the explicit memory module. In other words, the second variant of ReMemNN only exploit aspect target and its context to learn the attention weights in each hop, not utilizing the inner aspect dependent sentiment representation of previous hop. (3) ReMemNN-v3: The third variant of ReMemNN. It eliminates the explicit memory module. The current hop‟s output representation of attention module is used to as the one of next hop‟s inputs. Concatenating the last hop‟s output representation of attention module and aspect terms embeddings as the final aspect dependent sentiment representation to predict the sentiment polarity of aspect target. 5.2.2 State-of-the-art methods We compare our proposed model with following baseline models on English and Chinese datasets in our experiments. (1) LSTM: the stand LSTM [8] model accepts the words of sentence in order. The last state‟s output vector is regarded as sentence representation that is used to produce the probability of the label. (2) AdaRNN: AdaRNN [48] is a recursive neural network. It models syntactic relationships between context and aspect target to decide the sentiment polarity of each composition of the sentence by employing more than one composition functions and learning the adaptive sentiment propagations. (3) TD-LSTM: TD-LSTM [17] is a bidirectional LSTM towards aspect. The hidden output representation of forward and backward network is concatenated as the final sentence representation. (4) AT-LSTM: AT-LSTM [21] models the semantic relationship between the aspect target and context by employing attention mechanism. It concatenates the output state of lstm and aspect embedding to generate the attention weight value. (5) ATAE-LSTM: ATAE-LSTM [21] model also extends LSTM by not only taking into aspect embedding but also considering the connection between aspect embedding and context words vector to generate attention weights. (6) MemNN: Memory Network [30] is firstly used for aspect-level sentiment analysis task. Our proposed model is the variant of MemNN. Hence, this model is our primary comparison model. (7) IAN: IAN [23] employs two attention mechanism and lstm network to encode the context and aspect terms respectively. It employs the max-pooling mechanism and interaction between context and aspect target to model the semantic
relation between them. (8) RAM: RAM [53] takes advantage of multiple attention mechanism to produce the informative aspect dependent sentiment representation. It uses bidirectional lstm to model word sequence information and views the output state of bidirectional lstm as memory. (9) ATAM-S: ATAM-S [49] explicitly learns the adaptive embedding and aspect target sequence in three granularities: word, character and radical, the postfix S stands for single granularity. We reported the experimental results in the word granularity. (10) ATAM-F: ATAM-F [49] is the fusion of ATAM in three granularities as listed above. There are two ways to implement fusion: early fusion and late fusion. We reported the best results in either fusion and various combinations of three granularities. (11) ReMemNN-B: ReMemNN-B exploits the pre-trained word embedding from Bert [39] as the initialization word embedding in ReMemNN. The type of Bert is bert-base-uncased. We use the code form huggingface's transformers [54] to generate pre-trained word embedding. In the process of obtaining word embeddings from Bert, special tokens such as [CLS] and [SEP] are added at both ends of the word when the word is entered into the Bert. We take the average of the hidden state of the last layer of Bert on time axis as the word embedding of the word. 5.3. Experiment results and Discussion 5.3.1 Comparison with variants Table 2 Experimental results of variants of ReMemNN on three English datasets. Methods
Restaurant
Laptop
Twitter
Accuracy
Macro-F1
Accuracy
Macro-F1
Accuracy
Macro-F1
ReMemNN-v1
79.01
66.61
70.64
64.08
69.65
66.57
ReMemNN-v2
79.28
68.11
71.43
65.32
71.10
68.17
ReMemNN-v3
77.39
62.58
69.07
60.08
69.08
63.56
ReMemNN
79.64
68.36
71.58
65.41
71.39
68.88
In this section, we compare different variants of ReMemNN with experiments on the English datasets. The results are shown in Table 2. We can observe that ReMemNN achieves the highest accuracy and macro-F1 in all datasets. It generally proves that the availability of our proposed neural network architecture. In order to elaborate more details of our experiments, we conduct the comparisons with variants of our proposed model. The only difference between ReMemNN-v1and ReMemNN is that the former does not learn adaptive embedding. The decrease of performance from ReMemNN-v1 proves that ReMemNN has successfully learned adaptive embedding. Even if the adaptive embedding has been learned, the overall performance cannot be maintained. This is illustrated by ReMemNN-v2, which learnings adaptive embedding but does not learn powerful attention values. In other words, a more precision attention is acquired and results in the more informative inner aspect dependent sentiment representation in ReMemNN. Besides, it further demonstrates that the concatenating
mechanism can reduce attention noise in comparison with addition mechanism. ReMemNN-v3 omits the explicit memory module that is designed to store the inner aspect dependent sentiment representation of each hop. From Table 2, we can observe that ReMemNN-v3 obtains the worst performance in all datasets. The reason is that ReMemNN-v3 only take advantage of the last hop‟s inner aspect dependent sentiment representation without utilizing and concatenating the inner aspect dependent sentiment representations of each loop. The explicit memory module is designed to store and utilize these inner aspect dependent sentiment representations. It further proves our assumption that the inner aspect dependent sentiment representation contains informative information and plays a significant role in ABSA task. 5.3.2 Comparison with state-of-the-art methods Table 3 Experimental results of different methods on three English datasets. Methods
Restaurant
Laptop
Twitter
Acc.
F1
Acc.
F1
Acc.
F1
LSTM
74.28
61.94
66.45
60.53
64.84
63.58
AdaRNN
-
-
-
-
66.30
65.90
TD-LSTM
75.63
-
68.13
-
66.62
64.01
AT-LSTM
76.60
64.84
68.90
62.02
68.01
64.09
ATAE-LSTM
77.20
66.28
68.70
63.50
70.03
66.63
MemNN
78.38
66.70
71.11
64.79
70.38
67.00
IAN
78.60
67.33
71.78*
64.97
70.52
68.61
RAM
78.93*
68.12*
71.81
66.13
68.30
67.52
ReMemNN-B
76.96
66.42
65.05
60.83
94.09
93.85
ReMemNN
79.64
68.36
71.58
65.49*
71.39*
68.88*
From Table 3, we can see that ReMemNN beats other state-of-the-art methods which don‟t contain Bert in restaurant and twitter datasets and acquires comparable result in laptop dataset. In Table 3, the bold denotes the best result and the asterisk denotes the second-best result. We use „Acc.‟ to represent accuracy and use „F1‟ to represent macro-F1. Because the original paper didn‟t report accuracy or macro-F1 results of AdaRNN and TD-LSTM on restaurant and laptop datasets, thus we use the symbol „-‟ to denote them. We believe the first reason why ReMemNN obtains the state-of-the-art results is that we explicitly learned adjustable or adaptive embedding of each context word and aspect target word. The adjustable embedding of each context word and aspect target word not only carries semantic and grammar information from general pre-trained word embedding that was trained and learned from a huge general corpus but also learns semantic information within the context and domain. For example, in the baseline model ReMemNN-v1 ignores the embedding adjustment learning and, hence, results in a less domain information embedding and acquire poor results in all datasets. The embedding adjustment learning module plays an important role in learning contextual embeddings of context words and aspect target. If we directly exploit word embedding from pre-trained word embedding, which will reduce the performance of
model when the word is polysemous. The embedding adjustment learning module can mitigate this problem. The second reason is that we capture more precise semantic relation between the context and aspect target by means of the multielement attention mechanism which makes use of context words, aspect target and the inner aspect dependent sentiment representation. The multielement interaction of the multielement attention mechanism can generate powerful attention values. Other state-of-the-art methods use binary interaction, they either ignore context words or inner aspect dependent sentiment representations. The multielement attention mechanism plays a crucial part in improving the performance of the model. The biggest difference of attention mechanism between MemNN and our proposed model is that the former is binary attention mechanism, while the latter is multielement attention mechanism. In binary attention, it only takes context words and aspect target as inputs. In multielement attention, it takes the inner aspect dependent sentiment representation, context words, and aspect target as inputs. Multielement attention considers more information than binary attention of MemNN. Meanwhile, multielement attention can learn more powerful attention weights and generate more informative inner aspect dependent sentiment representation for ABSA. Besides, in multiple-layered MemNN, binary attention adds up attention output of previous layer and aspect target as one of inputs of current layer. Hence, binary attention will result in an uncorrelated representation about modeling the semantic relation between context and aspect target. Interaction between such an uncorrelated representation and context word embeddings will lead to poor attention values. In ReMemNN, multielement attention mechanism can directly exploit the aspect information and attention output of previous layer instead of using the way of adding them up. The multielement attention mechanism in our proposed model can capture more informative and precise information from attention output of each layer and aspect term. Thus, the model with multielement attention module can generate more powerful attention weights than binary memory. To further verify the importance of multielement attention mechanism, we designed the second variant of ReMemNN, which is ReMemNN-v2. The reduction of performance from ReMemNN to ReMemNN-v2 validated our assumption. The last reason is that we explicitly make use of aspect dependent sentiment representation of each hop to form the final representation to determine the sentiment polarity of aspect target. Explicit memory module also plays an important role. The aspect dependent sentiment representations of different hops are stored in explicit memory module. Hence, the model can exploit any previous states by explicit memory module. The aspect dependent sentiment representations of different hops have important informative semantic information. We argue that the lower hop mainly captures the low-level features, such as syntactic information, the higher hop primarily captures the high-level features, such as semantic information and pragmatic information. Other state-of-the-art methods only utilize the last hop‟s output state as the final aspect dependent sentiment representation, they ignored the inner output state and, hence, resulted a poor performance. To further verify the importance of explicit memory module and the inner aspect dependent sentiment
representation of different hops, we designed the third variant of ReMemNN, which is ReMemNN-v3. The sharp decrease of performance from ReMemNN to ReMemNN-v3 verified our assumption and the importance of explicit memory module and the aspect dependent sentiment representation of different hops. From Table 3, we can observe that ReMemNN-B acquires best results on twitter dataset. However, it acquires poor results on restaurant and laptop datasets. We believe that the main reason is the size of the dataset. From Table 1, we can see that the size of twitter is twice the size of restaurant and almost three times the size of laptop. The dimension of word embeddings from Bert is large, which means the model need more data to adjust word embeddings in the embedding adjustment learning module and indicates that the model is easy to overfit on small dataset. Besides, how to combine the pre-trained embedding like Bert and the existing model effectively is also a work of significance in the future. Table 4 Domain-specific embedding vs. Glove embedding on three English datasets. Embeddings
Restaurant
Laptop
Twitter
Acc.
F1
Acc.
F1
Acc.
F1
G-100
78.14
64.94
66.72
60.02
67.34
64.13
Ds
79.50*
67.58*
69.40*
62.09*
83.90*
83.01*
G-100+Ds
79.68
67.68
70.98
65.17
86.32
85.36
In order to investigate the effect of the specific-domain embedding, we conduct extra experiments on three English datasets. We use pre-trained Laptop embedding as initial word embeddings on laptop dataset and use pre-trained Review embedding as initial word embeddings on restaurant and twitter datasets. Because the dimension of the domain-specific embedding is 100, for a faire comparison, we use 100-dimension pre-trained Glove embedding as baseline. ReMemNN is employed as the backbone network. Experimental results on three English datasets are shown in Table 4. In Table 4, “G-100” denotes 100-dimension pre-trained Glove embedding and “Ds” denotes the domain-specific embedding. “G-100+Ds” denotes concatenating Glove embedding and domain-specific embedding as initial word embeddings. From Table 4, we can see that “Ds” acquire better performance than “G-100”. That is to say, the domain-specific embedding can obviously improve the performance of the model. The main reason is that the domain-specific embedding contains more domain information than general embedding. Besides, we can observe that combining the domain-specific embedding and general embedding can further improve performance of the model. The main reason we believe is that although some words are domain-dependent in the specific domain, but there are some words that are common in all domains where the general embedding can provide more informative knowledge. Therefore, combining these two types of embeddings can significantly improve performance. We conduct further experiments to test whether our proposed model is language independent. We conduct our experiments on four Chinese datasets, including camera, car, notebook and phone. Since our time and energy are limited, we didn‟t train Chinese word embeddings as Peng et al. [49] did, we randomly initialize them by uniform distribution that the least value is -0.01 and the maximum is 0.01.
Table 5 Experimental results of different methods on four Chinese datasets. Methods
Camera
Car
Notebook
Phone
Average
Acc.
F1
Acc.
F1
Acc.
F1
Acc.
F1
Acc.
F1
LSTM
78.31
68.72
81.99
58.83
74.63
62.32
81.38
72.13
79.08
65.50
TD-LSTM
70.48
51.46
76.53
46.67
67.10
40.58
69.17
53.40
70.82
48.03
AT-LSTM
85.05
83.44*
80.09
72.34*
79.34*
77.99
86.41
84.46*
82.73
79.56
ATAE-LSTM
85.54
84.09
81.90
76.88
83.47
82.14*
85.77
83.87
84.17
81.74*
MemNN
70.59
55.13
75.55
51.01
69.10
53.51
70.29
55.93
71.38
53.90
ATAM-S
82.88
72.50
82.94
64.18
75.59
60.09
84.86
75.35
81.57
68.03
ATAM-F
88.30
-
82.94*
-
77.52
-
88.46
-
84.31*
-
ReMemNN
87.60*
86.29
85.52
79.02
87.60
86.40
88.11*
87.01
87.21
84.68
Experimental results on these four Chinese datasets are shown in Table 5 in comparison with the top few state-of-the-art works, namely ATAM-F, ATAM-S, MemNN, ATAE-LSTM, AT-LSTM, TD-LSTM, LSTM. In Table 5, „Acc.‟ represents the accuracy and „F1‟ represents macro-F1. Because the original paper didn‟t report the F1 results of ATAM-F, thus we use the symbol „-‟ to denote them. From Table 5, we can see that our proposed model acquires the best macro-F1 in most datasets by 2.19%-4.26% and achieves the highest accuracy in both car and notebook datasets by around 0.3%-4.3%. In addition, ReMemNN obtains the comparable results in both camera and phone datasets. These experiment results further validate the contribution of our designed modules in ReMemNN. On the other hand, it verifies our proposed model‟s effectiveness. The main reason is that our model explicitly learns adjustment embeddings, employs explicit multielement attention mechanism and utilizes informative inner aspect dependent sentiment representations of different hops, where the last is crucial. 5.4. Effects of number of hops In this section, we explore the effects of different number of hops in our proposed model. The experimental results about accuracy and macro-F1 on different datasets are shown in Figure 3 and Figure 4 respectively. These datasets consist of restaurant, laptop, twitter, camera, car, notebook, and phone. From two figures, we can see that the model‟s performance consistently enhances with the number of hops increasing on different datasets. This is consistent with the results of Tang et al. [30] observed.
Figure 3. Accuracy of different number of hops from 2 to 7 on all datasets.
Figure 4. Macro-F1 of different number of hops from 2 to 7 on all datasets.
The main reason is that model can capture and learn more and more high order semantic features as the number of hops increasing. Making full use of these informative low and high order semantic features can provide the model the more comprehensive semantic information than only utilizing the high order semantic features to decide the sentiment polarity the of aspect target. It is validated by a variant of our proposed model, which is ReMemNN-v3. 5.5. Effects of value of k In this section, we will explore the effects of different value of k, which is a hyperparameter in the embedding adjustment learning module of the proposed model. The experimental results about accuracy and macro-F1 on the restaurant dataset are shown in Figure 5 and Figure 6 respectively. From these two figures, we can see that
the model acquires the best accuracy and macro-F1 when the value of k is 150. Besides, we can observe that the model performs worse when the value of k is larger. We believe the main reason is that the larger value of k makes the model easier to overfit.
Figure 5. Accuracy of different value of k on the restaurant dataset.
Figure 6. Macro-F1 of different value of k on the restaurant dataset.
5.6. Time complexity As the time complexity in deep neural networks is very difficult to exactly analyze, we take advantage of the runtime of top few state-of-the-art methods, namely AT-LSTM, ATAE-LSTM, IAN, RAM and ReMemNN with seven hops to reflect time complexity of each model. These methods are all trained on the same Nvidia GTX 1080Ti GPU. The running time of each iteration on the restaurant dataset is shown in Figure 7.
ReMemNN is almost 10 times faster than ATAE-LSTM. The reason is that these top few state-of-the-art methods except MemNN are all based on LSTM, which does complex operation in each LSTM unit along the sequential sequence. However, ReMemNN doesn‟t need recurrent computation of sequential sequence length. Our proposed model needs extra adjustment embedding learning and multielement interaction calculations compared to MemNN, so it is slightly slower than MemNN.
Figure 7. Runtime (seconds) of each training epoch of different models on the restaurant dataset.
5.7 Qualitative Analysis Table 6 attention weights towards aspect “bread” in ReMemNN and MemNN the
is
top
notch
as
well
MemNN
0.0925
0.1221
0.1284
0.3219
0.1214
0.2136
ReMemNN
0.0583
0.0587
0.0740
0.4208
0.0642
0.3241
In this section, we show the attention weights towards aspect term that are used to generate the inner aspect dependent sentiment representation. Table 6 shows the attention values towards aspect term “bread”. We select the sentence “the bread is top notch as well.” and aspect “bread” from Restaurant dataset as a case study. We apply ReMemNN with seven hops and MemNN with seven hops to model semantic relationships between context words and aspect term/phrase. From Table 6, we can observe that our proposed model ReMemNN generates more precise and powerful attention weights over MemNN. The bold denotes the model pay more attention to the corresponding word. Table 7 the predictions of ReMemNN towards different aspect targets in a sentence Sentence
Aspect target
Ground truth
Prediction
Not only was the food outstanding, but the
food
positive
positive
little 'perks' were great
perks
positive
positive
To further study that how ReMemNN behaves for the complicated sentence such
as the specific sentence in Figure 1. We report the predictions of ReMemNN about the specific sentence in Table 7 and show the attention weights of ReMemNN towards different aspect targets in Figure 8 and Figure 9. From Table 7, we can see that ReMemNN correctly predicts the sentiment polarity of two different aspect targets in a sentence. In Figure 8 and Figure 9, the darker color indicates greater attention weight. From Figure 8, we can see that the model focuses more on the context word “outstanding” when the aspect target is “food”. From Figure 9, we can observe that the model focuses more on the context word “great” when the aspect target is “perks”.
Figure 8. Attention weights of ReMemNN towards the aspect target “food”.
Figure 9. Attention weights of ReMemNN towards the aspect target “perks”.
6. Conclusions In this paper, we proposed a novel neural network for the ABSA task, named as Recurrent Memory Neural Network (ReMemNN). In our proposed ReMemNN, two sequence encoders are designed to tackle the problem of lacking of the specific semantic information from the context in pre-trained word embeddings. The multielement attention mechanism is designed to tackle the weak interaction between aspect target and its contexts. The multielement attention module can generate the powerful attention weights and more expressive inner aspect dependent sentiment representations. An explicit memory module is designed to learn inner aspect dependent sentiment representation and generate the final aspect dependent sentiment representation. An abundance of experiments on all Chinese and English datasets show that our ReMemNN acquires state-of-the-art performance in most datasets and obtains comparable results in the other datasets. These results also verify that ReMemNN is language-independent and dataset type-independent. Moreover, our proposed model is more efficient than other state-of-the-art methods according to runtime of each training epoch. In the future work, it is interesting to explore the specific role of each hop‟s aspect dependent sentiment representation. Besides, how to
combine the pre-trained embedding like Bert and the existing model effectively is also an interesting work. What needs to be pointed out is that our model isn‟t sensitive to word order. However, word order is important in sentiment analysis. Hence, combining our model and word order is also an interesting orientation in the future. Declaration of interests We declare that we have no conflict of interest to this work. Acknowledgments This research was funded by the Fundamental Research Funds for the Central Universities (grant number 2019YJS022). References [1] B. Liu, Sentiment analysis: Mining opinions, sentiments, and emotions, Cambridge University Press, 2015. [2] X. Fu, W. Liu, Y. Xu, L. Cui, Combine HowNet lexicon to train phrase recursive autoencoder for sentence-level sentiment analysis, Neurocomputing, 241 (2017) 18-27. [3] Z. Cui, X. Shi, Y. Chen, Sentiment analysis via integrating distributed representations of variable-length word sequence, Neurocomputing, 187 (2016) 126-132. [4] K. Schouten, F. Frasincar, Survey on aspect-level sentiment analysis, IEEE Transactions on Knowledge & Data Engineering, (2016) 1-1. [5] Y. Kim, Convolutional neural networks for sentence classification, arXiv preprint arXiv:1408.5882, (2014). [6] J. L. Elman, Finding structure in time, Cognitive science, 14 (1990) 179-211. [7] R. Socher, C.C. Lin, C. Manning, A.Y. Ng, Parsing natural scenes and natural language with recursive neural networks, Proceedings of the 28th international conference on machine learning (ICML-11), 2011, pp. 129-136. [8] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation, 9 (1997) 1735-1780. [9] F.A. Gers, J. Schmidhuber, F. Cummins, Learning to Forget: Continual Prediction with LSTM, Neural Computation, 12 (2000) 2451-2471. [10] J. Firth, J. Torous, J. Nicholas, R. Carney, S. Rosenbaum, J. Sarris, Can smartphone mental health interventions reduce symptoms of anxiety? A meta-analysis of randomized controlled trials, Journal of affective disorders, 218 (2017) 15-22. [11] P.K. Sarma, Y. Liang, W.A. Sethares, Domain Adapted Word Embeddings for Improved Sentiment Classification, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, pp. 37-42. [12] J. Blitzer, M. Dredze, F. Pereira, Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification, Proceedings of the 45th annual meeting of the association of computational linguistics, 2007, pp. 440-447. [13] M.-Y. Day, C.-C. Lee, Deep learning for financial sentiment analysis on finance news providers, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2016, pp. 1127-1134. [14] L. Zhang, S. Wang, B. Liu, Deep learning for sentiment analysis: A survey, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, (2018) e1253.
[15] W. Medhat, A. Hassan, H. Korashy, Sentiment analysis algorithms and applications: A survey, Ain Shams Engineering Journal, 5 (2014) 1093-1113. [16] N. Liu, B. Shen, Aspect-based sentiment analysis with gated alternate neural network, Knowledge-Based Systems, (2019) 105010. [17] D. Tang, B. Qin, X. Feng, T. Liu, Effective LSTMs for target-dependent sentiment classification, Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics, 2015, pp. 3298-3307. [18] N. Liu, B. Shen, Z. Zhang, Z. Zhang, K. Mi, Attention-based Sentiment Reasoner for aspect-based sentiment analysis, Human-centric Computing and Information Sciences, 9 (2019) 35. [19] H. Peng, E. Cambria, A. Hussain, A review of sentiment analysis research in Chinese language, Cognitive Computation, 9 (2017) 423-435. [20] K. Schouten, F. Frasincar, Survey on aspect-level sentiment analysis, IEEE Transactions on Knowledge and Data Engineering, 28 (2015) 813-830. [21] Y. Wang, M. Huang, L. Zhao, Attention-based lstm for aspect-level sentiment classification, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 606-615. [22] T. Rocktäschel, E. Grefenstette, K.M. Hermann, T. Kočiský, P. Blunsom, Reasoning about entailment with neural attention, arXiv preprint arXiv:1509.06664, (2015). [23] D. Ma, S. Li, X. Zhang, H. Wang, Interactive Attention Networks for Aspect-Level Sentiment Classification, arXiv preprint arXiv:1709.00893, (2017). [24] X. Li, L. Bing, W. Lam, B. Shi, Transformation Networks for Target-Oriented Sentiment Classification, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, pp. 946-956. [25] Y. Jo, A.H. Oh, Aspect and sentiment unification model for online review analysis, Proceedings of the fourth ACM international conference on Web search and data mining, 2011, pp. 815-824. [26] F. Xianghua, L. Guo, G. Yanyan, W. Zhiqiang, Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon, Knowledge-Based Systems, 37 (2013) 186-195. [27] Z. Hai, G. Cong, K. Chang, P. Cheng, C. Miao, Analyzing Sentiments in One Go: A Supervised Joint Topic Modeling Approach, IEEE Transactions on Knowledge and Data Engineering, 29 (2017) 1172-1185. [28] S. Sukhbaatar, J. Weston, R. Fergus, End-to-end memory networks, Advances in neural information processing systems, 2015, pp. 2440-2448. [29] C. Xiong, S. Merity, R. Socher, Dynamic memory networks for visual and textual question answering, International Conference on Machine Learning, 2016, pp. 2397-2406. [30] D. Tang, B. Qin, T. Liu, Aspect level sentiment classification with deep memory network, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 214-224. [31] C. Li, X. Guo, Q. Mei, Deep memory networks for attitude identification, Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017, pp. 671-680. [32] Y. Tay, L.A. Tuan, S.C. Hui, Dyadic memory networks for aspect-based sentiment analysis, Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 107-116. [33] Y. Tay, A.T. Luu, S.C. Hui, Learning to Attend via Word-Aspect Associative Fusion for
Aspect-based Sentiment Analysis, Thirty-Second AAAI Conference on Artificial Intelligence, 2018. [34] Z. Zhang, L. Wang, Y. Zou, C. Gan, The optimally designed dynamic memory networks for targeted sentiment classification, Neurocomputing, (2018). [35] A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, V. Zhong, R. Paulus, R. Socher, Ask me anything: Dynamic memory networks for natural language processing, International Conference on Machine Learning, 2016, pp. 1378-1387. [36] S. Wang, S. Mazumder, B. Liu, M. Zhou, Y. Chang, Target-sensitive memory networks for aspect sentiment classification, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, pp. 957-967. [37] Q. Yang, Y. Rao, H. Xie, J. Wang, F.L. Wang, W.H. Chan, Segment-level joint topic-sentiment model for online review analysis, IEEE Intelligent Systems, 34 (2019) 43-50. [38] N. Majumder, S. Poria, H. Peng, N. Chhaya, E. Cambria, A. Gelbukh, Sentiment and Sarcasm Classification with Multitask Learning, IEEE Intelligent Systems, 34 (2019) 38-43. [39] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805, (2018). [40] C. Sun, L. Huang, X. Qiu, Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, pp. 380-385. [41] H. Luo, T. Li, B. Liu, J. Zhang, DOER: Dual Cross-Shared RNN for Aspect Term-Polarity Co-Extraction, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 591-601. [42] D. Ma, S. Li, F. Wu, X. Xie, H. Wang, Exploring Sequence-to-Sequence Learning in Aspect Term Extraction, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 3538-3547. [43] Y. Li, Q. Pan, T. Yang, S. Wang, J. Tang, E. Cambria, Learning word representations for sentiment analysis, Cognitive Computation, 9 (2017) 843-851. [44] S. Xiong, H. Lv, W. Zhao, D. Ji, Towards Twitter sentiment classification by multi-level sentiment-enriched word embeddings, Neurocomputing, 275 (2018) 2459-2466. [45] J. Pennington, R. Socher, C. Manning, Glove: Global vectors for word representation, Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532-1543. [46] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv preprint arXiv:1406.1078, (2014). [47] M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, S. Manandhar, SemEval-2014 Task 4: Aspect Based Sentiment Analysis, Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014, pp. 27-35. [48] L. Dong, F. Wei, C. Tan, D. Tang, M. Zhou, K. Xu, Adaptive recursive neural network for target-dependent twitter sentiment classification, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014, pp. 49-54. [49] H. Peng, Y. Ma, Y. Li, E. Cambria, Learning multi-grained aspect target sequence for Chinese sentiment analysis, Knowledge-Based Systems, 148 (2018) 167-176. [50] H.-P. Zhang, H.-K. Yu, D.-Y. Xiong, Q. Liu, HHMM-based Chinese lexical analyzer ICTCLAS, Proceedings of the second SIGHAN workshop on Chinese language processing, 2003, pp. 184-187.
[51] H. Xu, B. Liu, L. Shu, S.Y. Philip, Double Embeddings and CNN-based Sequence Labeling for Aspect Extraction, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018, pp. 592-598. [52] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics, 5 (2017) 135-146. [53] P. Chen, Z. Sun, L. Bing, W. Yang, Recurrent attention network on memory for aspect sentiment analysis, Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 452-461. [54] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz,
Transformers:
State-of-the-art
Natural
Language
Processing,
arXiv
preprint
arXiv:1910.03771, (2019).
The information about Ning Liu: Ning Liu received the B.Sc. degree in Electronic and Information Engineering from Shandong University of Technology, Zibo, China, in 2015. He is currently pursuing the Ph.D. degree in Communication and Information System with the Institute of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, China. His research interest includes sentiment analysis, deep learning and NLP techniques.
The information about Bo Shen: Bo Shen received the B.S. degree in communication and control engineering from Northern Jiaotong University, Beijing, China, in 1995 and the Ph.D. degree in communication and information system at Beijing Jiaotong University, Beijing, China, in 2006. From 2006 to 2007, he was a post-doctoral researcher with the Institute of System Science, Beijing Jiaotong University, Beijing, China. Since 2008, he has been a researcher and teacher with School of Electronic and Information Engineering, Beijing Jiaotong University. He is currently a Professor in School of Electronic and Information Engineering, Beijing Jiaotong University and the Director of the Key Laboratory of Communication and Information Systems. His research interesting includes complex system theory, opinion evolution, data mining and NLP techniques
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: