Emotional editing constraint conversation content generation based on reinforcement learning

Emotional Editing Constraint Conversation Content Generation Based on Reinforcement Learning Journal Pre-proof Emotional Editing Constraint Conversa...

Download PDF

5MB Sizes 0 Downloads 21 Views

Report

Full Text

Emotional Editing Constraint Conversation Content Generation Based on Reinforcement Learning

Journal Pre-proof

Emotional Editing Constraint Conversation Content Generation Based on Reinforcement Learning Xiao Sun, Jia Li, Xing Wei, Changliang Li, Jianhua Tao PII: DOI: Reference:

S1566-2535(19)30223-4 https://doi.org/10.1016/j.inffus.2019.10.007 INFFUS 1168

To appear in:

Information Fusion

Received date: Revised date: Accepted date:

17 March 2019 8 October 2019 9 October 2019

Please cite this article as: Xiao Sun, Jia Li, Xing Wei, Changliang Li, Jianhua Tao, Emotional Editing Constraint Conversation Content Generation Based on Reinforcement Learning, Information Fusion (2019), doi: https://doi.org/10.1016/j.inffus.2019.10.007

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier B.V.

Highlights 1.Collaborative emotion vector is proposed to guide the generation of reply. 2.The reply is contrained from three aspects by introducing the reinforcement learning. 3.Multi-task learning is adopted to enhance the model's effect. 4.The proposed model is better than previous models by several experiments.

Emotional Editing Constraint Conversation Content Generation Based on Reinforcement Learning Xiao Sun School of Computer and Information Science, HeFei University of Technology

Jia Li School of Computer and Information Science, HeFei University of Technology

Xing Wei School of Computer and Information Science, HeFei University of Technology

Changliang Li Kingsoft Institute of Artificial Intelligence

Jianhua Tao National Laboratory of Pattern Recognition, Chinese Academy of Sciences

Abstract In recent years, the generation of conversation content based on deep neural networks has attracted many researchers. However, traditional neural language models tend to generate general replies, lacking logical and emotional factors. This paper proposes a conversation content generation model that combines reinforcement learning with emotional editing constraints to generate more meaningful and customizable emotional replies. The model divides the replies into three clauses based on pre-generated keywords and uses the emotional editor to further optimize the final reply. The model combines multi-task learning with multiple indicator rewards to comprehensively optimize the quality of replies. Experiments shows that our model can not only improve the fluency of the replies, but also significantly enhance the logical relevance and emotional rele∗ Corresponding

author Email address: [email protected] (Xiao Sun) 1 X.Sun and J.Li contributed equally to this study and share the first authorship.

Preprint submitted to Journal of LATEX Templates

October 8, 2019

vance of the replies. Keywords: Emotional conversation generation, affective computing, emotional editing, reinforcement learning, multitask learning

1. Introduction In recent years, with the development of artificial intelligence and robotics, affective computing has become increasingly critical in the research on humancomputer interaction. Emotional, natural interaction not only can achieve a 5

friendlier interface but also is the only way to achieve strong artificial intelligence. Artificial intelligence with both emotion and intelligence has higher practical value and significance [1, 2]. To achieve accurate artificial intelligence, it is necessary to facilitate natural human-computer interactions that integrate intelligence and emotion.

10

Emotional interactions are multimodal, including vision, speech, and text. Different levels of emotional information are included in different modalities. In addition to visual, speech and other forms of expression, text is a basic and essential mode of emotion expression and is widely used in daily life. The affective computing of text includes text emotion recognition and emotional text

15

generation. There are many research works on text emotion recognition, and generating emotional text is very challenging. It is difficult to consider emotions naturally and coherently because we need to balance grammaticality and expression [3]. The present emotional text generation considers the rule method and is a task-oriented application, limiting the domain adaptability and scalability

20

of the model. In recent years, data-driven neural network models have made significant progress in the field of text generation and can generate relatively high-quality text content [4, 5]. Most research efforts are focused on improving the quality of conversational content (e.g., fluency, diversity) [6, 7] while ignoring the generation of fine-grained emotional factors in text. In [8], the researchers

25

first introduced emotions into the neural network language model and proved that emotional sentences have better performance than sentences generated by

2

the traditional models without considering emotions. Other researchers have proposed reinforcement learning methods to generate emotional text [? 9] and further verified the feasibility of this work. In [10], the researchers used the 30

reinforcement learning method to minimize penalty items, further strengthened the constraints on the text emotions, and enhanced the emotional factors in text. There are two shortcomings in the past work. First, in existing neural network models, a whole sentence is generated word by word from left to right,

35

which is not entirely in line with habitual human natural language expression. This approach also limits the variety and fluency of the generated text, resulting in the generation of some meaningless, universal security replies [11, 12]. Second, existing models are not able to fully consider the emotional elements contained in a conversation or the lexical, syntactic, grammatical and other in-

40

formation related to emotional factors. Furthermore, the emotional strength of replies is uncontrollable, and in some cases, emotion is undetectable. Given the above deficiencies, this paper proposes an emotional editing constraint conversation content generation model based on reinforcement learning. The model has the following characteristics: (1) The model incorporates infor-

45

mation on topics and emotions into replies. At the same time, the templates in the training set are used to guide the generation so that the model can integrate more productive and more accurate emotions by ensuring the quality of the generated text. (2) Different from traditional language models that generate text from left to right, the model adopts the asynchronous generation

50

method, which divides a complete sentence into three parts and generates it to improve the overall fluency of the sentence. The reinforcement learning strategy is introduced to further semantically constrain the generated content based on coherence, topic, and emotion. (3) Based on different types of templates and indexs, the model can customize the content, emotional intensity, and style of

55

the generated text, which allows the model to be flexibly migrated to other related fields. Overall, this paper makes the following contributions: 3

• The proposed model framework selects the most appropriate template sentence based on the topic and emotion. Through the collaborative emo60

tion vector to guide the generation of reply, the generated reply has more accurate and detailed emotions. • The proposed model comprehensively constrains the generation of replies from three aspects: coherence, topic, and emotion by introducing the reinforcement learning.

65

• The proposed model introduces the multi-task learning method to enhance the model effect and learn the coherence, topic, and emotion of a reply so that these indicators can coordinate and constrain each other. • The experiments show that the proposed model is better than previous models that consider only one-sided factors, including the fluency of

70

replies, the relevance of emotion and the relevance of the topic. The rest of the paper is organized as follows: the second section briefly introduces the related work, the third section introduces the proposed model framework, the fourth section shows the experimental results and the specific analysis and discussion, and the fifth section summarizes the study and provides

75

recommendations for the future work.

2. Related Work 2.1. Emotional Conversation Generation Recently, a sequence-to-sequence model based on sequence prediction problems, which can be applied to large-scale datasets, has been widely used in ma80

chine translation [13] and conversation generation. For example, the neural responding machine [14] and the multi-layer recurrent neural network model [15]. Later, a large number of variant models based on this model were proposed, focusing on improving the quality of text in terms of grammar and sentence patterns, including increasing the diversity of generated text [5], introducing

4

85

additional prior knowledge to generate more meaningful text [6, 16], and using existing corpora as templates to improve the fluency of generated text [7, 17]. The work in [18, 19]verifies that machines that can generate meaningful and emotional text can enhance the user experience and lead to a smarter interaction. However, in the above work, emotional factors are less considered

90

in the text generation process. In [20], the authors propose a generation model that can generate text based on a specific emotion or intonation. The emotional language model proposed in [3] can generate text based on a given keyword and sentiment category. The model in [21] generates comments on documents by integrating grammatical information from a topical and emotional perspective.

95

The above work considers emotional factors but does not balance the grammatical rationality and emotional exquisiteness of the text; thus, it can’t generate more appropriate and more controllable emotions while ensuring the quality of the text content. In [8], the researchers considered the emotional elements in the conversation content in large-scale data applications and proposed the ECM

100

model. By introducing the emotion category vector and two storage mechanisms to generate the responses of the corresponding emotions, the quality of the generated text was improved compared with the past. In [22], the researchers introduce topic information and emotional information. Emotional keywords and topic keywords are predicted to guide the generation of replies so that the

105

replies have higher topic relevance and emotional relevance. Although the above work acknowledges the importance of emotional factors, the results obtained are not satisfactory. The work is unable to effectively excavate the emotional elements in conversation content. The emotional strengths of generated replies are uncontrollable and inconspicuous. It is difficult to give full

110

play to the role of emotion in conversation, and the resulting sentences appear to be very blunt and rigid. This is because the previous models lack prior knowledge to guide the generation of emotional texts, and the actual needs cannot be met by relying on only the single constraint. Therefore, it is necessary to introduce more precise grammatical constraints, semantic constraints, and emotional

115

constraints to comprehensively guide the generation of emotional text. 5

2.2. Text Generation Using Reinforcement Learning A standard recurrent neural network language model [23] predicts each word of a sentence conditioned on the previous word and an evolving hidden state. However, this method has three obvious shortcomings. First, the recurrent neu120

ral network model usually uses the maximum likelihood method for training, and the exposure bias [24] problem occurs during the training process, which seriously affects the generalization of the model. Second, the loss function used in recurrent neural network training is at the word level, but the performance is typically evaluated at the sentence level. Third, when the recurrent neural net-

125

work model faces different posts, because of the one-sidedness of the constraints, it is more likely to generate some non-meaningful and universal security replies, such as “ I don’t know.” This is because there are a large number of similar general answers in the training set, and this reply is very compatible with all kinds of posts.

130

Unsupervised text generation is an important research field of natural language processing. Recently, researchers have used reinforcement learning to generate text. In [25], the researchers trained an end-to-end, task-oriented dialogue system that maps post-replies in the form of key values. In [9], the researchers combined the traditional sequence-to-sequence model with the reinforcement

135

learning strategy and proposed a model with informativity, coherence, and ease of answer as the reward, which improved the content quality of text. In addition, the generative adversarial network (GAN) [26] is a novel unsupervised generation model that is similar in nature to reinforcement learning and has many applications in text generation. In [27], the researchers combined

140

the GAN with the reinforcement learning strategy to address the weakness that the GAN is indifferentiable to discrete sequence data. In [28], the researcher improved the two-class discriminator into a discriminator that sorts real text. The discriminator can learn the difference between real and false corpus faster and better guide the generator to generate text. The model proposed in [29]

145

attempts to send the intermediate feature information of the discriminator into the generator to solve the problem of difficulty in generating long text. In [10], 6

the researchers improved the traditional discriminator into a multi-category sentiment discriminator and used a Monte Carlo search to calculate the penalty term in the generation process to minimize the expectation of the overall penalty 150

term as the objective function. The emotional constraint in the text generation process is strengthened. The above method uses the reinforcement learning strategy and the GAN to strengthen the constraints in the text generation process, but there are still two shortcomings: First, because natural language belongs to high-level semantic

155

coding, it is difficult to find perfect objective indicators to measure it. It is not enough to use only a simple combination of several prior knowledge indicators as rewards because the indicators are still separated and can’t guide text generation very well. Second, the above models don’t fully consider the emotional elements contained in a conversation. The emotions contained in the text are

160

not precise enough, and the matching with the text content is not reasonable enough. The lexical, syntactic, grammatical and other information related to emotional factors is not considered. In addition, there are some unsupervised deep generation models, including variational autoencoders [30] and semi-supervised variational self-encoders [31].

165

A variational self-encoder consists of an encoder and a generator, which encode a data example to a latent representation and generate samples from the latent space, respectively. Although a VAE does not have the problem of generating discrete data, it has many more constraints and limitations than GAN. In [5], the researchers proposed an auto-encoder-based model in which two auto-encoder

170

unsupervised learning input and output sentences were used in the training, and a matching model was used to learn the alignment of the two. Although the model improved the fluency and diversity of the generated text, it ignored the emotional factors. In general, there are still many problems the field of emotional conversation

175

content generation. For example, the emotional relevance of generated text is uncontrollable and inconspicuous, even without emotions. Generic replies can be easily generated, but deeper aspects of text generation, such as lexical, syn7

tactic, grammatical and other aspects, are lacking. We propose solutions to the above problems. First, the most appropriate template is selected by emo180

tion and topic to guide text generation, to ensure accurate and fine-grained constraints in the text generation process, and to effectively solve the problem of the lack of emotional control. Second, using the reinforcement learning method, the three aspects of fluency, topic relevance and emotional relevance are considered. The multi-task learning strategy and the asynchronous generation

185

method are introduced to make these three indicators promote each other and be closely combined. The original separated indicators are deeply integrated into the lexical, syntactic and grammatical levels, which solves the shortcomings of the generation of universal replies and further strengthens the fluency, diversity and emotional relevance of text.

190

3. Emotional Conversation Generation Model This section discusses the proposed emotional conversation generation model in details. The entire model consists of an agent. The post and reply represent the question and answer in the conversation respectively. We use x to represent a post input by the external environment and y to represent a reply given by

195

the agent to the input. As the environment constantly enters posts, the agent can give corresponding replies. We regard the process of generating the reply of the neural network language model as the action of the agent. The parameters of the network are optimized to maximize the expected future rewards using policy search, as described in Section 3.5. Policy gradient

200

methods are more appropriate for our scenario than Q-learning because we can initialize the network using MLE parameters for already produced plausible replies before changing the objective and turning towards a policy that maximizes long-term rewards. Q-learning, on the other hand, directly estimates the future expected reward of each action, which can differ from the MLE objec-

205

tive by orders of magnitude, thus making MLE parameters inappropriate for initialization. The components (states, actions, reward, etc.) of our model are

8

summarized in the following sub-sections. 3.1. Action in Model An action is the process by which an agent generates a reply to an input post. 210

The action space is infinite since arbitrary-length sequences can be generated. 3.1.1. The Overview of Action

Figure 1: An overview of the emotional conversation generation framework.

As shown in Figure 1, the process of decoding is divided into five cases based on the result of the Structure Detector. OE denotes generation based only on the emotion keyword constraint, and OT denotes generation based only 215

on topic keywords. The keywords are obtained from pre-prepared dictionaries. The decoders in different situations do not share parameters. Given post x, the encoder is utilized to obtain the encoded vector. After that, the process of emotional conversation generation consists of the following four steps: Step I: The structure predictor is first used to predict whether an emotional

220

keyword or topic keyword needs to be included in the answer and to predict the positional relationship between them.

9

Step II: Based on the result of Step I, a keyword predictor is used to generate corresponding keywords (emotional keywords or topic keywords), and these keywords are used as prior knowledge to guide the generation of replies. 225

Step III: The asynchronous generation method is used to generate the reply. The model considers two cases. First, when only one keyword exists, an asynchronous decoder similar to [11] is used to generate the reply. Second, when the reply requires two keywords, it divides the whole sentence into three clauses in order with the keyword as the boundary. The three clauses are then combined

230

into a complete reply according to the positional relationship determined above. Step IV: A suitable template sentence is selected from the training set based on the emotional keyword and the topic keyword generated in Step II, and then, the template is used to generate a corresponding emotion editing vector. The template and the emotion editing vector are used to edit and optimize the reply

235

generated in Step III, thereby further improving the emotional accuracy and content quality of the reply. 3.1.2. Post Encoding The RNN used in the encoder is the gated recurrent units (GRUs). Given a sequence x = (x1 , x2 , x3 , ...., xT ), the recurrent hidden state is updated by: ht = GRU (ht−1 , xt )

240

(1)

where xt is the t-th word index, and ht−1 is the hidden state at time t-1. 3.1.3. Structure Detector This section aims at detecting whether the emotion keyword and topic keyword in our dictionaries should appear in reply y and to determine the positional relationship between the keywords. As shown in Fig. 1, we define the following

245

five specific cases: z s = 0: No keyword: a general forward decoder is used to generate the reply. z s = 1: Only an emotion keyword: an asynchronous decoder is used to generate the reply starting from the emotion keyword.

10

z s = 2: Only a topic keyword: an asynchronous decoder is the same as above 250

but is used to generate the reply starting from the topic keyword. z s = 3: Both emotion keyword and topic keyword: the order is the topic keyword first, followed by the emotion keyword. The reply is divided into three parts with two keywords as the boundary, and three clauses are generated in turn by the asynchronous decoder.

255

z s = 4: Both emotion keyword and topic keyword: the order is the emotion keyword first, followed by the topic keyword. The reply is divided into three parts with two keywords as the boundary, and three clauses are generated in turn by the asynchronous decoder. Formally, given post x, we first obtain hidden state sequence h from the

260

encoder. Then, case number z s is determined by a fully connected layer as follows: ˜ p(z s = i|x) = sof tmax(W s · h)

(2)

˜ = PT hi . For each post x, this module is always where i ∈ {0, 1, 2, 3, 4},h i=1

called first. The multi-class classifier described by the above equation predicted the structure for reply y so that we can perform the following work.

265

3.1.4. Keywords Predictor The main role of the keyword predictor is to predict which keywords in our dictionaries should appear in the reply. The adopted dictionaries are divided into an emotion dictionary and a topic dictionary. The emotion dictionary we use is the work in [32], which contains 27,466 emotion keywords divided into

270

seven categories: Happy, Like, Sad, Angry, Fear, Disgust and Surprise. The topic dictionary is obtained by a pre-trained Latent Dirichlet Allocation (LDA) model from [33], including 10 categories and 1000 keywords for each category. As shown in Figure 2, we first use the pre-trained LDA model to analyse the input post, predict the topic category of the reply, and determine the emotion

275

category by artificially designating one of the seven categories listed above. However, regardless of whether it is an emotional category or a topic category, the knowledge representation is highly abstract. Therefore, we refer to the work 11

Figure 2: The process of keyword generation.

in [8], introduce the topic category vector and the emotion category vector, and then use the category vector to further predict the keyword. Therefore, the prior 280

knowledge generated above can be reasonably utilized. In addition, instead of using the hidden sequence of the encoder to predict keywords directly, a sequence attention mechanism based on prior knowledge is applied to complement the insufficient information in the encoder. To integrate the prior knowledge into the process, we compute the correlation

285

between different kinds of knowledge embedding k = {k et , k tp } and each unit in the hidden sequence of encoder, representing it as a specific weight value. Formally, the details are described as follows: ck,∗ =

T X

αik,∗ hi

(3)

i=1

exp(ek,∗ ) αik,∗ = PT i k,∗ t=1 et ek,∗ = (vαk,∗ )T tanh(Wαk,∗ k ∗ + Uαk,∗ hi ) i 12

(4)

(5)

290

where ∈ {et, tp} represents the aspect of the topic or emotion. vαk,∗ , Wαk,∗ and

Uαk,∗ are the trainable parameters. The information is concentrated in the weighted vector ck,∗ and the conditional probabilities of keywords are calculated by:

295

k w k,et p(wet |x, k et ) = sof tmax(Wet c )

(6)

k w k,tp p(wtp |x, k tp ) = sof tmax(Wtp c )

(7)

where ck,et , ck,tp are the attention units computed by Eq. 3. Each of the above equations can be viewed as a multi-class classifier that produces a probability distribution over all emotion keywords or topic keywords. 3.1.5. Asynchronous Decoder

Figure 3: The structure of asynchronous decoder.

After the keywords are selected and the structure of the reply is determined, 300

the next step is to generate the reply based on the keywords. For the case where only one keyword is included, we use this keyword as the starting point and then go backward and forward to generate other parts of the reply. For the case where two keywords are included, because there are two situations, one of them is selected for detailed description to facilitate discussion. That is, 13

305

the emotional keyword is in front, and the topic keyword is in the back. Other situations can be analogized. Formally, we define that the input post is x = (x1 , x2 , ..., xT ) and the rek k ply is y = (ws , y et , wet , y md , wtp , y tp , we ), where ws and we represent the start k k word hGOi and the terminator hEOSi, respectively, and wet and wtp represent

310

the emotional keyword and the topic keyword, respectively. y et represents the

portion between hGOi and the emotional keyword, y md represents the portion between the emotional keyword and the topic keyword, and y tp represents the

portion between the topic keyword and hEOSi. As shown in Figure 3, the entire reply is divided into three clauses. First, 315

k as the starting word and the ending word, we generate y et with hGOi and wet k k as the and wtp respectively. Second, based on the y et , we generate y md with wet

starting word and the ending word, respectively. Third, based on the y et and k the y md , y tp is generated starting from wtp to hEOSi. Then, the clauses and

keywords are combined in the previously determined order to get a complete 320

reply. The specific process is as follows: k p(y et |x, w1k ) = p(y et |g(x), < ws , wet >)

=

M Y

i=1

et p(yiet |yi−1 , set i )

k k p(y md |y et , w2k ) = p(y md |g([wet , y et , wtp ]))

=

L Y

i=1

md p(yimd |yi−1 , smd i )

k k p(y tp |y et , y md , w3k ) = p(y tp |g([ws , y et , wet , y md , wtp ]))

=

N Y

i=1

tp p(yitp |yi−1 , stp i )

(8)

(9)

(10)

tp k k k k where wik ∈ {(ws , wet ), (wet , wtp ), (wtp , we )} denotes the set of keywords, smd i ,si

and stp j denote intermediate states in the decoding process of the three clauses,

325

respectively.

14

Figure 4: Emotional content editing optimization process.

3.1.6. Emotional Editor (1) Picking a template: As shown in Figure 4, the words with single underlines denote the keywords and the words with wavy underlines denote the emotional editing part. We select the template y 0 in training set χ based 330

on the keywords and the positional relationship of the keywords. The priority when selecting a template is as follows (decreasing): a sentence containing the same keywords and the same positional relationship, a sentence containing the same keywords but different positional relationships, a sentence containing only the same topic keyword, and a sentence containing only the same emotional

335

keyword. Using lexical-level similarities to distinguish between sentences with the same priority: L(y, y 0 ) = dJ (y, y 0 )

(11)

We treat sentences as a set of word tokens, where dJ (y, y 0 ) is the Jaccard distance between template sentence y 0 and primary reply y. According to the above rules, the sentence with the highest priority and the highest similarity with the 340

candidate reply is selected as the template sentence y 0 . (2) Calculating the emotion editing vector: After obtaining template sentence y 0 from the previous step, the next step is to find the mapping rela15

tionship between the sentence pairs (y 0 , y), that is, the emotion editing vector. In alignment with the work of [17], the authors suppose that y 0 and y differ 345

by only a single word w. Then, one might propose that edit vector z should be equal to the word vector for w. Generalizing this intuition to multi-word edits, they want multi-word insertions and deletions to be represented as the sum of the inserted word vectors. In contrast to the above work, to enhance the optimization effect of the edit vector on emotion, we introduce the emotion

350

coefficient for each word in a sentence. The smaller the distance from the emotional keyword, the greater the emotional coefficient of the word is. We multiply the word vector of each word to be modified and the word emotion coefficient and then sum them, thereby calculating the final emotion editing vector, instead of summing the word vectors of the words to be modified.

355

Formally, define I = y/y 0 to be the set of words added and D = y 0 \y to

be the words deleted. We represent the difference between y 0 and y using the following vector: αw = √ f (y, y 0 ) =

1 (lw − µ)2 exp(− ) 2σ 2 2πσ

X

αw Φ(w)

w∈I

MX

αw Φ(w)

(12)

(13)

w∈D

where lw represents the distance between word w and emotional keyword and 360

αw represents the emotional coefficient of word w. Φ(w) represents the word L vector of word w and represents a join operation. Referring to Kelvin’s work, we design q to add noise to perturb the direction

of vector f . We let fnorm = kf k,fdir = f /fnorm and let vM F (v; µ, κ) denote a vM F distribution over points v on the unit sphere with mean vector µ and 365

concentration parameter κ. Define the following: q(zdir |y 0 , y) = vM F (zdir ; fdir , κ)

(14)

q(znorm |y 0 , y) = U nif (znorm ; [fenorm , fenorm + ε])

(15)

16

where fenorm = min(fenorm , 10 − ε) is the truncated norm. The resulting edit vector is z = zdir · znorm .

(3) Editing optimization: We employ an encoder-decoder architecture 370

to implement emotional editor, where prototype y 0 is the input sequence and revised sentence y is the output sequence, extending it to condition on an edit vector z by concatenating z to the input of the decoder at each time step: p(y|y 0 , z) =

K Y

j=1

p(yi |yi−1 , [si , z])

(16)

Where si represents the hidden state of decoder at time i and z is the emotional edit vector. The emotional editing optimization of the reply is completed, and 375

final reply y is obtained. 3.2. State The state is represented by post x input by the external environment. The post is sent to the model and converted into a vector representation so that the agent updates its state and performs corresponding actions.

380

3.3. Policy Note that we use a stochastic representation of the policy (a probability distribution over actions given states). A deterministic policy would result in a discontinuous objective that would be difficult to optimize using gradient-based methods.

385

3.4. Rewards Calculation R denotes the reward obtained for each action. In this subsection, we discuss major factors that contribute to the success of a reply and describe how approximations to these factors can be operationalized in computable reward functions.

390

Coherence: First, we need to ensure that the generated replies have excellent coherence and contain no grammatical errors or mismatches. In addition, we need to control the length of the replies to avoid generating replies that 17

are too short or too long because such replies are often meaningless or contain some redundancies; thus, the replies fail to meet the actual needs. pseq2seq (y|x) 395

denotes the probability of generating reply y given post x. pbackward denotes seq2seq the backward probability of generating post x based on reply y. pbackward is seq2seq trained in a similar way as standard sequence-to-sequence models with sources and targets swapped. Again, to control the influence of replies length, both log pseq2seq (y|x) and log pbackward seq2seq (x|y) are scaled by the length of replies. Ny

400

and Nx represent the length of the reply and the post, respectively. We calculate the coherence of reply y with the following: r1 =

1 1 log pseq2seq (y|x) + log pbackward seq2seq (x|y) Ny Nx

(17)

Topic relevance: We hope that the reply generated will closely follow the topic of the given post and generate a more reasonable reply. We use the pretrained LDA model mentioned earlier to make topic category predictions for 405

the reply. We define k tp as the topic category for the post, LDA(y) as the predicted probability distribution of the topic of the LDA model for the reply, and Ntp as the total number of topic categories. The topic relevance of reply y is calculated by the following: r2 = −

Ntp X

kitp log(LDAi (y))

(18)

i=1

Emotion relevance: To ensure that a reply has rich emotions, we intro410

duce the emotional relevance to evaluate the reply. Similar to the measure of topic relevance, we use a convolutional neural network to classify the reply into sentiment categories and, based on the predictions, to see if the reply meets the pre-required sentiment categories. We define k et as the specified sentiment category, Det (y) as the predicted probability distribution of the classifier, and

415

Net as the total number of sentiment categories. We calculate the emotional relevance of reply y by the following: r3 = −

Net X

kiet log(Diet (y))

i=1

18

(19)

To strengthen the constraints on the reply generation process, rewards are calculated for each clause, that is, the weighted sum of the indicators proposed above. Each clause has a different focus, so its weights in reward calculations 420

are different. After repeated experiments, when we use the following weight parameters, the model has the best fitting effect on the corpus. The reward calculation formulas are organized as follows: ret = 0.2r1 + 0.2r2 + 0.6r3

(20)

rmd = 0.2r1 + 0.4r2 + 0.4r3

(21)

rtp = 0.2r1 + 0.6r2 + 0.2r3

(22)

r = 0.5r1 + 0.25r2 + 0.25r3

(23)

425

where ret , rmd and rtp represent the rewards of the three clauses y et , y md and

Figure 5: Reward calculation process.

y tp , respectively, and r represents the reward of the reply that is spliced and 19

edited. The process of calculating the reward is shown in Figure 5. Therefore, the final reward R for generating a reply is R(a, [x, y]) = ret + rmd + rtp + r 430

(24)

The model uses multiple indicators to comprehensively consider the reply; therefore, to promote learning between indicators, the model introduces a multi-task learning strategy based on parameter sharing [34]. In the process of generating the reply, the encoder is shared, especially in the process of generating each clause. By using the same encoder, the indicators can be combined with each

435

other, which is more conducive to measuring the quality of reply from the overall perspective. 3.5. Optimization The model is able to generate some plausible replies by initializing the MLE parameters. We then use policy gradient methods to find parameters that lead

440

to a larger expected reward. The objective to maximize is the expected future reward: JRL (θ) = Ep(a1 :T ) [

T X

R(ai , [xi , yi ])]

(25)

i=1

where R(ai , [xi , yi ]) denotes the reward resulting from action ai . We use the likelihood ratio trick [35] for gradient updates: ∇JRL (θ) ≈

X i

∇ log p(ai |xi , yi )

T X

R(ai , [xi , yi ])

(26)

i=1

4. Experiments and Results 445

4.1. Dataset Description The experiments used the emotional conversation dataset NLPCC20172 to train and test the proposed model. The dataset was screened and counted in advance, and the specific results are shown in Table 1. There are 1,118,341 2 http://tcci.ccf.org.cn/conference/2017/

20

Type

Number

Percentage

All

1,118,341

100.0%

B-H

486,478

43.5%

O-E

150,976

13.5%

O-T

413,787

37%

N-H

67,100

6%

Table 1: Dataset statistical results.

post-reply pairs after the entire dataset has been filtered to remove meaningless 450

sentences. Here, we focus on data experiments with two keywords. Approximately 43.5% of the conversation replies contain two keywords. We used 8,000 for validation, 3,000 for testing and the rest for training. The dataset is imbalanced; for example, the data for the surprised emotional category accounts for only approximately 1.1%, and the data for the angry

455

emotional category account for only 0.7%. Thus, we used the over-sampling method to construct additional training data: we replaced the emotion keywords in part of the training data with words that belong to the infrequent categories as synonyms. These generated conversations were then used as supplemental training data. A total of 80,000 pairs of sentences were randomly selected from

460

the training set to train the LDA model, and 100,000 pairs of sentences were randomly selected to train the emotional classifier for calculating rewards. 4.2. Implementation Details The encoders and decoders in our framework use GRU networks with a 2layer structure and 256 hidden units in each layer. The word embedding (with a

465

size of 256) is pre-trained using Word2Vec on the Chinese Weibo Corpus3 . The 3 https://github.com/Embedding/Chinese-Word-Vectors

21

size of the dictionary is 40,000: it contains both generic words and keywords. The optimization method used is the Adam algorithm, where the learning rate starts at 0.005, the decay rate is 0.99, and the other hyperparameters are set to their default values. The whole process took about four weeks on two GTX1080 470

GPU machines. The implementation of our framework is based on the Tensorflow4 deep learning framework. 4.3. Baseline Models To comprehensively evaluate the proposed model, we construct baseline models from high-quality text generation and emotional dialogue generation

475

and compare the properties of the proposed model. Seq2Seq: An encoder and a decoder for text generation can be used to generate some fluent text. We compare it to the quality of the generated text. ECM: It introduced emotional embedding vectors and two stored mechanisms to generate emotional replies. We contrast it with the emotional intensity

480

and emotional accuracy of the replies. SentiGAN: Using GAN and the reinforcement learning strategy support the generation of emotional text. We contrast it with the emotional intensity and emotional accuracy of the replies. E-SCBA: The model introduced both emotion and topic knowledge into

485

the generation to make a comprehensive optimization for the quality of replies. We contrast it with the content quality and sentiment of the text. W/O Edit: To verify the effect of the proposed emotion editor, the emotion editor is removed and compared to the complete model. 4.4. Manual Evaluation

490

We asked four annotators to evaluate the results of our model and baselines. In total, we used 700 conversations, 100 for each emotion category, which were 4 https://github.com/tensorflow/tensorflow/

22

Model

Overall C

Seq2Seq SentiGAN

L

Happy E

C

L

Like E

C

L

Surprise E

C

L

E

1.288 0.764 0.430 1.299 0.924 0.571 1.332 0.785 0.435 1.182 0.723 0.152 1.347

-

1.068 1.425

-

1.285 1.395

-

1.128 1.200

-

0.687

E-SCBA

1.335 1.122 0.955 1.421 1.286 1.230 1.336 1.168 1.100 1.197 0.901 0.500

W/O Edit

1.336 1.136 0.994 1.426 1.285 1.219 1.325 1.185 1.114 1.194 0.897 0.614

Ours

1.390 1.170 1.135 1.484 1.302 1.316 1.410 1.199 1.224 1.214 0.900 0.742

Ground Truth 1.739 1.615 1.312 1.867 1.728 1.562 1.910 1.530 1.164 1.782 1.627 1.074 Table 2: The results of manual evaluation (C = Consistency, L = Logic, E = Emotion).

sampled randomly from the test set. The annotators were asked to score a reply based on the following metrics: Consistency: It measures the fluency and grammaticality of the reply on 495

a three-point scale: 0, 1, 2. A score of 0 means that the reply is completely unreasonable, a score of 1 means that the reply is basically fluent but may be missing the subject, conjunction or other component, and 2 points means that the reply is coherent and smooth and the composition is intact. Logic: It measures the degree to which the post and the reply logically

500

match on a three-point scale: 0, 1, 2. The score of 0 means that the reply is confusing and illogical. The score of 1 means that the reply is related to the post, but there may be problems such as mismatching. The 2 means that the reply has the same topic as the post, and the reply is logical. Note that overly short or overly frequent replies would be annotated as either 0 or 1 (if

505

the annotator thought the reply related to the post), such as “ Me to ” or “ I think so ”. Emotion: Measure whether the reply includes the right emotion. A score of 0 means the emotion in the reply is wrong or there is no emotion, a score of 1 means the reply has the correct emotion but the intensity is weak, and a score

510

of 2 means the reply has the correct emotion and the intensity is strong. Table 2 and Table 3 (2-tailed t-test: p < 0.05 for Consistency and Logic, p < 0.01 for Emotion) compare our model with the baselines. As we can see,

23

Model

Sad C

Seq2Seq SentiGAN

L

Fear E

C

L

Angry E

C

Disgust

L

E

C

L

E

1.384 0.928 0.481 1.245 0.786 0.430 1.206 0.526 0.226 1.369 0.678 0.712 1.500

-

1.167 1.260

-

1.020 1.120

-

0.886 1.531

-

1.306

E-SCBA

1.495 1.267 1.050 1.264 1.124 0.906 1.110 0.824 0.694 1.521 1.289 1.204

W/O Edit

1.504 1.205 1.101 1.254 1.129 0.924 1.116 0.909 0.702 1.532 1.294 1.287

Ours

1.548 1.349 1.174 1.332 1.197 1.120 1.135 0.905 0.982 1.605 1.335 1.387

Ground Truth 1.808 1.547 1.186 1.725 1.583 1.314 1.605 1.638 1.306 1.803 1.643 1.574 Table 3: The results of manual evaluation (C = Consistency, L = Logic, E = Emotion).

the average performance of our model on the three indicators is better than that of other models, indicating that our model effectively improves the topic 515

and emotional relevance of the reply based on the quality of the text. The experimental results are further analysed below. Considering consistency and emotional relevance, we find that our model is much better than other models. However, the model without the emotion editor is not outstanding in relation to these two indicators; in fact, it performs

520

worse than the other models. This shows that the proposed emotion editor can improve the fluency and emotional relevance of the reply. Considering logic, our model doesn’t achieve the best results for the surprised and angry emotions; it is second only to E-SCBA and the model without the emotion editor. This is mainly because the datasets for the two emotion cat-

525

egories are relatively small, and there are fewer sentences related to the training set. This leads to the selection of a template sentence ignoring the constraints on the topic, which leads to the deviation of the optimized reply in the topic. Thus, the logic scores drop. To further prove that the replies generated by the proposed model are not

530

only relevant to the topic content but also emotionally related, the score distribution of each model in terms of logic and sentiment is calculated, as shown in Table 4. As observed, the baseline models have a small proportion of 2-2, which indicates that they cannot balance the emotion and the topic. However, the

24

Model(%)

2-2

1-2

1-1

1-0

0-1

Seq2Seq

10.5

5.4

15.1

42.6

10

SentiGAN

24.8

21.7

36.9

5.6

11.3

E-SCBA

28.6

15.8

30

16.9

3.3

W/O Edit

27.5

14.9

31.7

15

9.7

Ours

41.7

25.6

20.4

1.1

4.4

Table 4: Logical and sentiment scores in the manual assessment.

model proposed in this paper performs well in this respect, with the proportion 535

of 2-2 reaching 41.7% and the percentage of the emotional score of 2 reaching 67.3%, which shows that the proposed model makes up for the shortcomings of the previous model’s weak emotion. 4.5. Automatic Evaluation As argued in [14], BLEU is not suitable for measuring conversation gener-

540

ation due to its low correlation with human judgment. We adopted perplexity to evaluate the model at the content level (to determine whether the content is relevant and grammatical). To evaluate the model at the emotion level, we adopted emotion accuracy as a reflection of agreement between the expected emotion category (as input to the model) and the predicted emotion category of a reply generated by the emotion classifier. The results of the experiment are

Model

Perplexity 1

2

Accuracy 3

1

2

3

Seq2Seq

67.4

69

68

0.164

0.188

0.175

SentiGAN

65.1

69.6

66.7

0.768

0.790

0.792

E-SCBA

64.8

65.9

66.1

0.774

0.769

0.772

W/O Edit

65

66

66.5

0.775

0.759

0.776

Ours

62.2

61

61.4

0.871

0.870

0.869

Table 5: The results of objective evaluation. 545

shown in Table 5. The results indicate that the model achieved the best results 25

in terms of perplexity and emotional accuracy. This is mainly because perplexity is considered in the calculation of rewards and optimization as well as the emotional elements of the reply. In addition, the effect of the three experiments 550

of the proposed model is better than that of other models, which also confirms that the model has better robustness. In practice, emotion accuracy is more important than perplexity considering that the generated sentences are already fluent and grammatical with a perplexity of 68.0. The model is also compared with the model without the emotion editor. The

555

results show that the latter does not perform well in terms of perplexity and emotional accuracy. However, after the emotion editor is added, the model’s performance greatly improves. This shows that relying only on prior knowledge such as keywords is not enough to improve the overall effect of the reply, so the scores of E-SCBA and W/O Edit are low.

560

To this end, the emotion editor in the model described in this paper can solve these problems well to integrate the keyword prior knowledge into the reply naturally. Smoothing, optimizing, and other editing operations are performed on the generated replies according to the template, which not only makes the reply more fluent but also makes the emotions of the reply more prominent due

565

to the guidance of the template sentences. In Figure 6, we visualize the diversity distribution of words in different positions (1-10) of the generated reply. Our model is committed to solving the problem of generic replies, which can be defined as a high frequency of certain replies to posts as well as a large number of identical words produced in the

570

same place. In other words, a rich word diversity in replies reflects a model that generates meaningful replies. The diversity of word positions in our scenario is defined as the number of different words in the same position. The results shown in the figure have been normalized. Compared with other models, our model has a much deeper colour in the

575

same positions, which means that the replies generated from our model have richer content. The worst model of all is the general Seq2Seq, whose diversity in different locations is always low. In addition to insufficient information from 26

Figure 6: The visualization of word distribution, where positions with deeper colour have a higher diversity.

posts, the immutable sequential structure limits the potential of the model, resulting in generic replies. In contrast, the model proposed in this paper obtains 580

sufficient information in the process of decoding by introducing prior knowledge of keywords. The editor is then used to optimize replies to further improve the text quality and emotional relevance without generating a single secure reply. Overall, the colour of our model fades more slowly and has a longer duration (1-7) than that of other models, showing that our model improves not

585

only the quality of content but also the capacity of memory. For one thing, the asynchronous decoder allows information to be transmitted to the next clause during generation, which reduces the path of transmission and implicitly enhances the ability of the networks to memorize information. In addition, with the help of the knowledge of the emotion editing vector, the information

590

contained in the reply is further strengthened, and the storage capacity of the model is indirectly expanded. In short, the expansion of storage capacity helps the model generate better replies because this may allow the machine to have more room to think about how to generate more meaningful replies.

27

Figure 7: Sampled conversations with different emotions from the test data.

4.6. Case Study 595

Even for the same post, there are different features for generation with different emotions, and there may be multiple suitable replies. Therefore, we provide some examples with multiple emotions in Figure 7 and Figure 8. The words with single underlines denote the keywords and the words with wavy underlines denote the emotional editing part. As we can see, the general Seq2Seq prefers

Figure 8: Sampled conversations with different emotions from the test data.

28

600

to generate short and meaningless replies. In these examples, it extracts information from posts correctly, but its ability to handle information is not flexible. The replies are more like a summary of the posts rather than a conversation. In contrast, the replies generated by our model are very relevant to the posts and accurately match the post topics. This is mainly due to the topic keywords

605

introduced in the early stage of the model. The ECM model is greatly improved compared to the Seq2Seq model, and it can generate smooth and emotional replies. However, the examples demonstrate that the emotions generated by the ECM are not obvious, and most of the sentences have weak or even no emotion, such as the reply with a disgusted

610

emotion asking “ Where is it here? Ask for explanation ”, etc. This is because ECM introduces the emotion embedding vector to guide the model to generate an emotional reply, which lacks detailed emotional guidance information when generated, resulting in fuzzy emotional replies. In contrast, the replies generated by our model, after the optimization of emotional editing, not only have rich

615

and varied sentence patterns but also greatly enhance the emotional relevance and intensity. For example, the happy reply “ I have never seen such a cute cake! ” and the angry reply “ What terrible weather! ” contain more realistic and more detailed emotions. 4.7. Conclusion and Future Work

620

This paper proposes an emotional conversation generation model based on the reinforcement learning. Our model generates replies in three iterations and proposes a mechanism of emotional editing that refers to existing sentences to further strengthen the content quality and emotional relevance of the replies. Subjective and objective experiments show that the model proposed in this

625

paper can generate logical and emotional replies by ensuring the fluency of replies, and the emotion is more prominent and delicate. In the future, we will enhance the flexibility of the model by introducing other knowledge (such as a tone) and customize a personalized framework to meet the specific needs of the 29

actual application.

630

References References [1] C.-S. Wong, K. S. Law, The effects of leader and follower emotional intelligence on performance and attitude: An exploratory study, The Leadership Quarterly 13 (3) (2002) 243–274.

635

[2] M. N. Khuong, V. N. B. Tram, The effects of emotional marketing on consumer product perception, brand awareness and purchase decision—a study in ho chi minh city, vietnam, Journal of Economics, Business and Management 3 (5) (2015) 524–530. [3] S. Ghosh, M. Chollet, E. Laksana, L.-P. Morency, S. Scherer, Affect-lm: A

640

neural language model for customizable affective text generation, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 634–642. [4] Z. Tian, R. Yan, L. Mou, Y. Song, Y. Feng, D. Zhao, How to make context more useful? an empirical study on context-aware neural conversational

645

models, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2017, pp. 231–236. [5] L. Luo, J. Xu, J. Lin, Q. Zeng, X. Sun, An auto-encoder matching model for learning utterance-level semantic dependency in dialogue generation, in: Proceedings of the 2018 Conference on Empirical Methods in Natural

650

Language Processing, 2018, pp. 702–707. [6] C. Xing, W. Wu, Y. Wu, J. Liu, Y. Huang, M. Zhou, W.-Y. Ma, Topic aware neural response generation, in: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI Press, 2017, pp. 3351–3357.

30

[7] G. Pandey, D. Contractor, V. Kumar, S. Joshi, Exemplar encoder-decoder 655

for neural conversation generation, in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 1329–1338. [8] H. Zhou, M. Huang, T. Zhang, X. Zhu, B. Liu, Emotional chatting machine: Emotional conversation generation with internal and external memory, in:

660

Thirty-Second AAAI Conference on Artificial Intelligence, 2018, pp. 1–8. [9] J. Li, W. Monroe, A. Ritter, D. Jurafsky, M. Galley, J. Gao, Deep reinforcement learning for dialogue generation, in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 1192–1202.

665

[10] K. Wang, X. Wan, Sentigan: generating sentimental texts via mixture adversarial networks, in: Proceedings of the 27th International Joint Conference on Artificial Intelligence, AAAI Press, 2018, pp. 4446–4452. [11] L. Mou, Y. Song, R. Yan, G. Li, L. Zhang, Z. Jin, Sequence to backward and forward sequences: A content-introducing approach to genera-

670

tive short-text conversation, in: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 2016, pp. 3349–3358. [12] M. Galley, C. Brockett, A. Sordoni, Y. Ji, M. Auli, C. Quirk, M. Mitchell, J. Gao, B. Dolan, deltableu: A discriminative metric for generation tasks

675

with intrinsically diverse targets, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2015, pp. 445–450. [13] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares,

680

H. Schwenk, Y. Bengio, Learning phrase representations using rnn encoder– decoder for statistical machine translation, in: Proceedings of the 2014 Con-

31

ference on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 1724–1734. [14] C.-W. Liu, R. Lowe, I. Serban, M. Noseworthy, L. Charlin, J. Pineau, How 685

not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation, in: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 2122–2132. [15] I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, J. Pineau, Building

690

end-to-end dialogue systems using generative hierarchical neural network models, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI Press, 2016, pp. 3776–3783. [16] S. Liu, H. Chen, Z. Ren, Y. Feng, Q. Liu, D. Yin, Knowledge diffusion for neural dialogue generation, in: Proceedings of the 56th Annual Meeting of

695

the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 1489–1498. [17] K. Guu, T. B. Hashimoto, Y. Oren, P. Liang, Generating sentences by editing prototypes, Transactions of the Association for Computational Linguistics 6 (2018) 437–450.

700

[18] T. Partala, V. Surakka, The effects of affective interventions in human– computer interaction, Interacting with computers 16 (2) (2004) 295–309. [19] H. Prendinger, M. Ishizuka, The empathic companion: A character-based interface that addresses users’affective states, Applied Artificial Intelligence 19 (3) (2005) 267–285.

705

[20] Z. Hu, Z. Yang, X. Liang, R. Salakhutdinov, E. P. Xing, Toward controlled generation of text, in: Proceedings of the 34th International Conference on Machine Learning-Volume 70, JMLR. org, 2017, pp. 1587–1596. [21] T. Cagan, S. L. Frank, R. Tsarfaty, Data-driven broad-coverage grammars for opinionated natural language generation (onlg), in: Proceedings 32

710

of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 1331–1341. [22] J. Li, X. Sun, A syntactically constrained bidirectional-asynchronous approach for emotional conversation generation, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018,

715

pp. 678–683. ˇ [23] T. Mikolov, S. Kombrink, L. Burget, J. Cernock` y, S. Khudanpur, Extensions of recurrent neural network language model, in: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2011, pp. 5528–5531.

720

[24] S. Bengio, O. Vinyals, N. Jaitly, N. Shazeer, Scheduled sampling for sequence prediction with recurrent neural networks, in: Advances in Neural Information Processing Systems, 2015, pp. 1171–1179. [25] T. Wen, D. Vandyke, N. Mrkˇs´ıc, M. Gaˇs´ıc, L. Rojas-Barahona, P. Su, S. Ultes, S. Young, A network-based end-to-end trainable task-oriented di-

725

alogue system, in: 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017-Proceedings of Conference, 2017, pp. 438–449. [26] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, in: Ad-

730

vances in Neural Information Processing Systems, 2014, pp. 2672–2680. [27] L. Yu, W. Zhang, J. Wang, Y. Yu, Seqgan: sequence generative adversarial nets with policy gradient, in: AAAI-17: Thirty-First AAAI Conference on Artificial Intelligence, Vol. 31, Association for the Advancement of Artificial Intelligence (AAAI), 2017, pp. 2852–2858.

735

[28] K. Lin, D. Li, X. He, Z. Zhang, M.-T. Sun, Adversarial ranking for language generation, in: Advances in Neural Information Processing Systems, 2017, pp. 3155–3165. 33

[29] J. Guo, S. Lu, H. Cai, W. Zhang, Y. Yu, J. Wang, Long text generation via adversarial training with leaked information, in: 32nd AAAI Conference 740

on Artificial Intelligence, AAAI, 2018, pp. 5141–5148. [30] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved techniques for training gans, in: Advances in neural information processing systems, 2016, pp. 2234–2242. [31] D. P. Kingma, S. Mohamed, D. J. Rezende, M. Welling, Semi-supervised

745

learning with deep generative models, in: Advances in neural information processing systems, 2014, pp. 3581–3589. [32] X. L. L. H. P. Yu, R. H. C. Jianmei, Constructing the affective lexicon ontology, Journal of the China Society for Scientific and Technical Information 2 (2008) 6.

750

[33] T. Hofmann, Probabilistic latent semantic indexing, in: ACM SIGIR Forum, Vol. 51, ACM, 2017, pp. 211–218. [34] D. Dong, H. Wu, W. He, D. Yu, H. Wang, Multi-task learning for multiple language translation, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint

755

Conference on Natural Language Processing (Volume 1: Long Papers), 2015, pp. 1723–1732. [35] R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine learning 8 (3) (1992) 229–256.

34

Emotional editing constraint conversation content generation based on reinforcement learning

Emotional editing constraint conversation content generation based on reinforcement learning

Recommend Documents