Available online at www.sciencedirect.com Available online at www.sciencedirect.com
ScienceDirect ScienceDirect
Procedia Computer Science 00 (2019) 000–000 Available online at www.sciencedirect.com Procedia Computer Science 00 (2019) 000–000
ScienceDirect
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
Procedia Computer Science 145 (2018) 564–571
Postproceedings of the 9th Annual International Conference on Biologically Inspired Cognitive Postproceedings of the 9thBICA Annual International Conference Inspired Cognitive Architectures, 2018 (Ninth Annual MeetingonofBiologically the BICA Society) Architectures, BICA 2018 (Ninth Annual Meeting of the BICA Society)
Virtual pet powered by a socially-emotional BICA Virtual pet powered by a socially-emotional BICA
Anastasia A. Bogatyrevaa, Andrei D. Sovkova, Svetlana A. Tikhomirovaa, Anna R. a, Andreia D. Sovkova, Svetlana A. a,b a, Anna R. Anastasia A. Bogatyreva , Alexei V. Samsonovich Tikhomirova * Vinogradova Vinogradovaa, Alexei V. Samsonovicha,b*
aNational aNational
Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Kashirskoye shosse, 31, Moscow, 115409, Russia bGeorge Mason University, Fairfax, VA 22030, USA Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Kashirskoye shosse, 31, Moscow, 115409, Russia bGeorge Mason University, Fairfax, VA 22030, USA
Abstract Abstract Cognitive architectures are used to build intelligent agents, and nowadays special attention in this area is drawn to emotion Cognitive arethis used to build intelligent agents, nowadays special attention in this one areaofiswhich drawnis to emotion modelling. architectures The purpose of study is to compare two modelsand describing social-emotional behavior, based on a traditional machine learning algorithm, the othertwo on models a cognitive architecture supporting social emotionality. It is hypothesized modelling. The purpose of this study is and to compare describing social-emotional behavior, one of which is based on a traditional machine learning and the on auser’s cognitive architecture supporting socialonemotionality. It is hypothesized that the second model will bealgorithm, more efficient in other eliciting empathy to a virtual cobot based this model. Here the object of that model be more efficient in controlling eliciting user’s empathy to a virtual acobot based on this model. Here(athe object of studythe is second a virtual pet: awill penguin. Two models the pet were compared: reinforcement learning model Q-learning study is a virtual a penguin. Two models controlling pet were compared: a reinforcement learning (a Q-learning algorithm) and thepet: emotional cognitive architecture eBICAthe (Samsonovich, 2013). The second approach was model based on a semantic algorithm) the emotional architecturebased eBICA 2013). The second was basedscores on a higher semantic map of pet’sand emotional states, cognitive that was constructed on(Samsonovich, the human ranking. It is found thatapproach the eBICA model in map of pet’s emotional states, that constructed on the humanlearning. ranking.This It isarticle found compares that the eBICA model higher of in participant’s empathy compared towas the model basedbased on reinforcement strengths andscores weaknesses both methods. In conclusion, thetofindings indicate advantages of thelearning. approach based on compares eBICA compared more traditional participant’s empathy compared the model based on reinforcement This article strengthsto and weaknesses of both methods. In conclusion, the findings indicate advantages of the approach based on eBICA compared to more traditional techniques. Results will have broad implications for building intelligent social agents. techniques. Results will have broad implications for building intelligent social agents. © 2019 The Authors. Published by Elsevier B.V. © 2018 The Authors. Published by Elsevier B.V. © 2019 The Authors. by Elsevier B.V. This is an open accessPublished article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the scientific committee of the the 9th Annual Annual International International Conference Conference on on Biologically Biologically Inspired Inspired Peer-review under responsibility of the scientific committee of 9th Cognitive Architectures. Architectures. Peer-review under responsibility of the scientific committee of the 9th Annual International Conference on Biologically Inspired Cognitive Cognitive Architectures. Keywords: virtual character; semantic space; emotional intelligence; reinforcement learning; cognitive architecture Keywords: virtual character; semantic space; emotional intelligence; reinforcement learning; cognitive architecture
* Corresponding author. * Corresponding author.
E-mail addresses:
[email protected],
[email protected],
[email protected],
[email protected],
[email protected] E-mail addresses:
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
1877-0509 © 2019 The Authors. Published by Elsevier B.V. This is an open access under the CC by BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) 1877-0509 © 2019 The article Authors. Published Elsevier B.V. Peer-review under responsibility of the committee of the(https://creativecommons.org/licenses/by-nc-nd/4.0/) 9th Annual International Conference on Biologically Inspired Cognitive This is an open access article under thescientific CC BY-NC-ND license Architectures. Peer-review under responsibility of the scientific committee of the 9th Annual International Conference on Biologically Inspired Cognitive Architectures.
1877-0509 © 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the scientific committee of the 9th Annual International Conference on Biologically Inspired Cognitive Architectures. 10.1016/j.procs.2018.11.101
2
Anastasia A. Bogatyreva et al. / Procedia Computer Science 145 (2018) 564–571 Anastasia A.Bogatyreva et al. / Procedia Computer Science 00 (2019) 000–000
565
1. Introduction Virtual environment simulator software, such as virtual and augmented reality systems with autonomous agents embedded in them, become common platforms in a variety of research and application domains. For the convenience of human perception of the environment, it is reasonable to embed a virtual assistant in the environment, giving it a form of a living being like a pet with humanlike properties. In this context, a challenge for developers is to create a virtual intelligent agent, whose behavior would elicit empathy in users and would be as believable as possible from the point of view of a person who expects, e.g., to encounter a friendly pet. In this area, many research and development efforts have been made recently. Some applications targeted smartphone interfaces [1], others augmented videogames that were designed not only to entertain the user (e.g., virtual robots-toys [2-4]), but also to assist them (intellectual advisers, expert systems or tutoring agents, as well as robotic assistants [5-7]). Here, a similar approach is used to develop and study a virtual companion: a pet, embedded in a virtual environment. The present work compares two approaches to solving the challenge: one based on reinforcement learning and the other based on an emotional Biologically Inspired Cognitive Architecture (eBICA) [8]. The main hypothesis underlying this study is that behavior generated by a reinforcement-trained system will appear to the user less natural than behavior produced by an eBICA-based model. In other words, it is expected that the model based on eBICA will behave more realistically and, as a result, will be more popular among participants, therefore, more socially acceptable. 2. Materials and methods 2.1. Description of the paradigm The task in this study is to implement and compare two models controlling behavior of a virtual cobot (collaborative robot). Comparison should clarify the relative effectiveness of the models in eliciting human empathy to the cobot.
Fig. 1. Game window: the penguin in its environment. The red line indicates the graph in which the penguin can move. Leaf vertices are objects of the environment with which you can interact, the rest of the vertices are intermediate states as you move towards the object.
566
Anastasia A. Bogatyreva et al. / Procedia Computer Science 145 (2018) 564–571 Anastasia A.Bogatyreva et al. / Procedia Computer Science 00 (2019) 000–000
3
To achieve this goal, it was decided to implement a virtual pet: a penguin, embedded in a virtual interior containing a cave, a pool, a bowl with fish, a tower, several toy balls, and a mattress (Fig. 1). The user’s avatar is the mere hand that could perform various actions with respect to the penguin. The virtual animal is able to move around the area and interact with objects on its own. For example, the penguin could swim in the pool, sleep in the cave, eat from the bowl, juggle balls, and so on. In addition, it can interact with the hand by responding to hand’s actions. For example, the penguin can run away or show aggression in response to tickling, as opposed to enjoying it. The repertoire of hand’s actions is equally diverse: the hand can tickle the animal, hit it with a slipper, feed it by giving fish directly or by placing fish into the bowl, or kick the ball away from the penguin, etc. The penguin moves along the graph shown by red lines in Fig. 1. Vertices of the graph are linked to objects and special locations in the environment, such as crossings. In addition, the penguin is able to express a variety of emotional states using body language: some examples are represented in Fig. 2. Their emotional values were calculated based on the ranking done by participants of this study (see below). The task for participants in the main experiment was to interact with the penguin naturally as they would interact with a real pet, for as long as they are pleased, up to several minutes.
Fig. 2. Examples of penguin’s emotional states expressed by body language, putatively labeled Aggression, Tenderness, Love, and Respect.
2.2. Reinforcement learning In order to start learning with reinforcements, it is necessary to define parameters of the model, such as agent’s embodiment and available actions, reward values for particular targets, etc., and to select a learning algorithm. In our case, the animal itself is the agent that learns; available actions are its movements along the edges of the graph and attempts to interact with the environmental objects. The state of the penguin is represented by a vector, components of which correspond to three basic needs: to eat (achieving satiety), to sleep (restoring vigor), and to play (attaining joy). Affinity to the user (the hand) is another useful scale, that was not associated with a drive, but was used as a parameter subject to optimization. The penguin attempts to interact with an object when the distance between the animal and the object is sufficiently small. With each successful interaction, the penguin receives a reward. The value of the reward can be either positive or negative, depending on the action performed. The reward calculation takes into account the fact that the hand, representing the user embedded in the environment, can move and be the initiator of a contact with the animal, performing various actions of variable valence: from negative to positive. The goal for the participant in the interaction paradigm is to maintain the three pet's well-being characteristics above zero for as long as possible, thus prolonging the game time. One of the well-known algorithms of reinforcement learning, Q-learning [12] was used here. For any finite Markov decision process, Q-learning finds a policy that is optimal in the sense that it maximizes the expected value of the total reward over all successive steps. Starting from the current state, Q-learning can identify an optimal action-selection policy for any given decision process. Here "Q" stands for the utility function that returns the reward used to provide the reinforcement and can be considered a synonym of the quality of the action outcome in the given state. Q-learning was used here for training as described below. The goal of learning is a policy that tells how the agent should choose a particular action in given circumstances. If one is looking for what one needs to do in a familiar situation, then Q-learning will help in this decision process.
4
Anastasia A. Bogatyreva et al. / Procedia Computer Science 145 (2018) 564–571 Anastasia A.Bogatyreva et al. / Procedia Computer Science 00 (2019) 000–000
567
The deep Q-learning algorithm (DQN algorithm) was selected here, which connects Q Learning with deep neural networks. Deep Q network (DQN) [9] is represented by a multilayer neural network with three convolutional and two fully connected layers. At the output of this network for the current state st, the vector of values corresponding to the actions from the set is obtained. The DQN agent interacts with the environment and updates the policy based on the actions taken (this is known as the on-policy learning algorithm). The Q-value for the state action is updated with an error corrected by the learning speed α. Q-values are a possible rewards received at the next time step for taking measures at the St state, plus a γ discount on a future reward received from the next state observation. The value of Q itself is calculated according to the equation (1). At time t + 1, all values are updated, and the reward changes to rt + 1, and the current estimate of Q changes to [12] 𝑄𝑄(𝑠𝑠$ , 𝑎𝑎$ ) ← 𝑄𝑄(𝑠𝑠$ , 𝑎𝑎$ ) + 𝛼𝛼 +𝑟𝑟$-. + 𝛾𝛾 max 𝑄𝑄(𝑠𝑠$-. , 𝑎𝑎$-. ) − 𝑄𝑄(𝑠𝑠$ , 𝑎𝑎$ )5, 3
(1)
where • st и st +1 are the current and next state of the animal, respectively;
• at is the current action of the animal; • rt +1 is the reward;
• α, γ are positive coefficients.
The learning process is based on the experience replay technique [10], which stores experience, including state transitions, rewards and actions that provide the necessary data for Q-Learning, and mini-batch files for updating neural networks. 2.3. Model based on eBICA The basic model of pet behavior in this paper is based on eBICA (emotional Biologically Inspired Cognitive Architecture [8]), in which the emotional state of the agent is described using a semantic map. This model implies that the agent state vector has emotional components (such as “sadness”, “anger”, “joy”, etc.) in addition to the scales of well-being described above. Appraisals of the actor A and of the action a were calculated according to [8]. The semantic map was constructed in the form of a complex plane, where the valence value corresponded to the real part, while the dominance value corresponded to the imaginary part of the complex number. To control the rate of appraisal updating, a constant parameter r was used. In this work, appraisals of the actors (the penguin Ap and the human Ah) and the results of their interactions (actions of the penguin ap and the human ah) are introduced as two-dimensional vectors on the semantic map, represented by complex numbers. Voluntary interaction of the pet with the environment affects only the state of the penguin, while the interaction of the person with the penguin changes both appraisals. These changes are described by equations (2), (3) below. 𝐴𝐴$-. = (1 − 𝑟𝑟)𝐴𝐴$7 + 𝑟𝑟𝑎𝑎7: , 7 $-. 𝐴𝐴; = (1 − 𝑟𝑟)𝐴𝐴$; + 𝑟𝑟𝑎𝑎;< .
(2) (3)
𝐿𝐿 ∼ +Re A𝑎𝑎B𝐴𝐴∗7 + 𝐴𝐴; DE5 .
(4)
The penguin’s preferences for action selection, when interacting with the environment, are initially determined at random, with preferences given to nearby objects. Later on, the likelihood of action becomes determined by the developing emotional state (more precisely, self-appraisal) of the penguin. The likelihoods of penguin’s responses to human actions are determined by appraisals of the human, the penguin, and the action, according to the formula (4)[8]: -
The axes of the semantic map were determined by the rankings of participants processed using the principal component analysis – to determine the most informative axes. The numerical values of the rankings were determined using a survey form. Description of the procedure and its outcome are presented in the Results section.
568
Anastasia A. Bogatyreva et al. / Procedia Computer Science 145 (2018) 564–571 Anastasia A.Bogatyreva et al. / Procedia Computer Science 00 (2019) 000–000
5
2.4. Participants Participants that took part in experiments in this study were 25 college students (16 men and 9 women age between 18 and 28), most of whom were students of NRNU MEPhI, originally from Russia. All participants were fluent in English, while Russian was their native language. 3. Results and analysis 3.1. Experiment №1 To build a semantic map, each participant was asked to rate 13 animated penguin images by the following criteria: interest, serenity, trust, anxiety, agitation, anger, resentment, aggression, humility, pleasure, love, disappointment and compassion, every of which was rated on a scale from 0 to 10, where 0 is the absence of emotion in animation, and 10 is the maximum expression of emotion. For the collected data, the principal component analysis was carried out. As a result, the following two-dimensional semantic map was obtained (Fig. 3). The two axes were approximately interpreted as “valence” and “arousal”. The third principal component had no clearly interpretable semantics and therefore was not used in the model.
Fig.3. Semantic map of emotional expressions of the penguin (for examples, see Fig. 2).
The result of this experiment (Fig. 3) confirmed the hypothesis that when building a two-dimensional semantic map, it is natural to expect valence and dominance dimensions on top of others (consistently with previous studies, e.g., [11]). In our case they provide the greatest variation in the set of perceived emotional states of the pet. 3.2. Experiment № 2 To facilitate the construction of a mathematical model, an experiment was conducted in which 12 students took part (10 men and 2 women). Each of the subjects was asked to evaluate different situations on two scales of valence and dominance ranging from -5 to +5. Here +5 valence meant a positive reaction, and -5 valence a negative reaction, respectively; +5 dominance corresponded to a high dominance, and -5 to submission. The survey was divided into two parts: in the first section, participants were asked to pretend to be the pet and evaluate the various actions of their master in relation to them. In the second section, participants evaluated actions of the penguin from the master’s perspective. In this survey, the subjects were shown 27 recorded sessions of the game. The outcomes of the data analysis are shown in Fig. 4. These results allowed us to calculate the appraisal vectors used in the eBICA model.
6
Anastasia A. Bogatyreva et al. / Procedia Computer Science 145 (2018) 564–571 Anastasia A.Bogatyreva et al. / Procedia Computer Science 00 (2019) 000–000
569
Fig 4. Appraisal vectors (map locations) of the emotional states of the penguin relative to the neutral state.
The points indicated on the semantic map by numbers (Fig. 4) correspond to the actions in contexts described in the list below. 0. The master hits the penguin with a slipper. 1. The master feeds the penguin with fish from the hands. 2. The master let the penguin play with the balls. 3. The master tickles the penguin. 4. The master pets the penguin on the head. 5. The master scolds the penguin for bad behavior. 6. The master orders the penguin to ride the hill. 7. The master orders the penguin to swim on the mattress in the pool. 8. The master sends the penguin to sleep. 9. The master forcibly wakes the penguin up. 10. The master sends the penguin to eat from the bowl. 11. You hit a penguin with a slipper. He dutifully accepted the punishment. 12. You hit a penguin with a slipper. He aggressively responded to this action. 13. You offer a penguin to eat fish from your hands. He eats it with pleasure. 14. You offer a penguin to eat fish from your hands. He continues to go about his business / runs away. 15. You offer a penguin to play with balls. He eagerly starts playing. 16. You offer a penguin to play with balls. He continues to go about his business / runs away. 17. You begin to tickle the penguin. He enjoys the tickling. 18. You begin to tickle the penguin. He snaps at you / runs away. 19. You start petting the penguin on the head. He enjoys. 20. You start petting the penguin on the head. He snaps at you / runs away. 21. You chastise a penguin for bad behavior. He dutifully accepts the punishment. 22. You chastise a penguin for bad behavior. He snaps at you. 23. You order the penguin to go down the slide / swim on the mattress / go to sleep. He obeys. 24. You order the penguin to go down the slide / swim on the mattress / go to sleep. He continues to go about his business. 25. You cannot wake up the penguin: he continues to sleep. 26. You cannot find a penguin, as he hid in his cave.
Anastasia A.Bogatyreva et al. et / Procedia Computer Science 00 (2019) 000–000 Anastasia A. Bogatyreva al. / Procedia Computer Science 145 (2018) 564–571
570
7
Fig. 5. A screenshot of the game. The penguin uses the slide.
3.3. Experiment №3 The main experiment involved 13 participants, who were asked to interact with two pet models, one after another (they received no information as to which model was presented first). One model was based on the reinforcement learning algorithm, the other model was based on the cognitive architecture eBICA. Each participant was asked to interact with the penguin naturally as they would interact with their own pet. Participants were not informed about the kind and features of each models prior to the experiment. All user interactions, as well as the interactions of the penguin with the environment, were recorded in a log file. After the experiment, participants underwent a survey in which, on a scale from 1 to 10, the quality and believability of each model was evaluated. 3.4. Results The following average values were obtained as a result of processing the survey data (Table 1). Table 1. Main results: average human rankings of two implementations of the virtual penguin. BICA
RL
Rate impressions about the model, 1 - did not like it, 10 - really liked it
7.92
6.31
How realistic was this model? 1 - unrealistic, 10 - realistic
7.15
6.08
Statistical variance analysis yielded the p-value 0.0436 for the first survey and the p-value 0.1613 for the second survey. Thus, the first difference (Row 1, Table 1) was found significant. The lack of significance in the second row may be due to the limitations of the implementation, that did not allow participants to bring the penguin far away from the average state. In addition, the lack of significance in the second row could be due to the choice of parameters of the eBICA model, which resulted in the model requiring that the penguin stops executing user commands upon
8
Anastasia A. Bogatyreva et al. / Procedia Computer Science 145 (2018) 564–571 Anastasia A.Bogatyreva et al. / Procedia Computer Science 00 (2019) 000–000
571
reaching a threshold of low valence, while most users did not report that low values of the penguin valence during the experiment. In general, however, results confirm the hypothesis, indicating significant superiority of the eBICA model over the conventional model based on reinforcement learning in eliciting empathy to the penguin among participants. 4. Discussion In this paper, a prototype of an emotional pet penguin embedded in a virtual environment was implemented and studied empirically. With the help the ranking by 25 participants and subsequent principal component analysis of the collected data, a semantic map of penguin actions and emotional states was constructed, that was then used to implement an eBICA model controlling penguin’s behavior and its interaction with the user. Comparison of two models controlling the virtual pet – one based on reinforcement learning and another on eBICA – revealed superiority of the eBICA model in its ability to elicit user’s empathy to the penguin, that can be considered a measure of social acceptability. The difference in believability of the two models, however, was not found significant in this study. In this work, one of the simplest levels of emotional interaction between a user and a virtual pet was investigated. Improvements of this simple model should be made in future works. For example, the pet could respond differently depending on the personal relationships between it and the master, using moral schemas of eBICA [8]. This will make the pet more realistic in the eyes of the user who interacts with it. In addition, when empowered with moral schemas, the penguin will be able to show complex social emotions such as resentment, disappointment, remorse, compassion, and more. 5. Acknowledgments This work was supported by the Russian Science Foundation, Grant # 18-11-00336. The authors are grateful to the students of the NRNU MEPhI for lively discussions and participation in the experiments. References [1] Chella, A., Sorbello, R., Pilato, G., Vassallo, G., and Giardina, M. (2013) “A new humanoid architecture for social interaction between human and a robot expressing human-like emotions using an android mobile device as interface.” Biologically Inspired Cognitive Architectures 2012. Advances in Intelligent Systems and Computing, vol 196: 95–103. Springer: Cham, Switzerland. [2] Azarnov, D.A., Chubarov, A.A., and Samsonovich, A.V. (2018) “Virtual actor with social-emotional intelligence.” Procedia Computer Science, 123: 76–85. [3] Augello, A., Pilato, G., and Gaglio, S. (2009) “A conversational agent to support decisions in SimCity like games.” Proceedings of IEEE ICSC 2009, pp. 367–372. [4] Monceaux, J., Becker, J., Boudier, C., and Mazel, A. (2009) “Demonstration – First Steps in Emotional Expression of the Humanoid Robot Nao”. Proceedings of ICMI-MLMI’09, November 2–4, 2009, Cambridge, MA, pp. 235-236. ACM 978-1-60558-772-1/09/11. [5] Chella, A., Pilato, G., Sorbello, R., Vassallo, G., Cinquegrani, F., and Anzalone, S.M. (2008) “An emphatic humanoid robot with emotional latent semantic behavior.” Simulation, Modeling, and Programming for Autonomous Robots: Proceedings of the First International Conference SIMPAR 2008, Venice, Italy, November 3-6, pp. 234-245. [6] Pilato, G., Vassallo, G., Augello, A., Vasile, M., and Gaglio, S. (2005) “Expert chat-bots for cultural heritage.” Intelligenza Artificiale, 2: 2531. [7] Augello, A., Pilato, G., and Gaglio, S. (2011) “Intelligent advisor agents in distributed environments.” In: Soro, A., Vargiu, E., Armano, G., and Paddeu, G. (Eds.) Information Retrieval and Mining in Distributed Environments. Studies in Computational Intelligence, 324, pp. 109-124. Berlin: Springer. [8] Samsonovich, A.V. (2013) “Emotional biologically inspired cognitive architecture” Biologically Inspired Cognitive Architectures, 6: 109-125. [9] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015) “Human-level control through deep reinforcement learning.” Nature, 518: 529–533. [10] Lin, L.-J. (1992). Reinforcement Learning for Robots Using Neural Networks (Doctoral Dissertation). Carnegie Mellon University: Pittsburgh, PA. [11] Osgood, C.E., Suci, G.J., and Tannenbaum, P.H. (1957) The Measurement of Meaning. Urbana, IL: University of Illinois Press. [12] Sutton, R.S., and Barto, A.G. (1998) Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.