Story embedding: Learning distributed representations of stories based on character networks

Artiﬁcial Intelligence 281 (2020) 103235 Contents lists available at ScienceDirect Artiﬁcial Intelligence www.elsevier.com/locate/artint Story embe...

Download PDF

3MB Sizes 0 Downloads 26 Views

Report

PDF Reader
Full Text

Artiﬁcial Intelligence 281 (2020) 103235

Contents lists available at ScienceDirect

Artiﬁcial Intelligence www.elsevier.com/locate/artint

Story embedding: Learning distributed representations of stories based on character networks O-Joun Lee a , Jason J. Jung b,∗ a

Future IT Innovation Laboratory, Pohang University of Science and Technology 77, Cheongam-ro, Nam-gu, Pohang-si, Gyeongsangbuk-do, 37673, Republic of Korea b Department of Computer Engineering, Chung-Ang University 84, Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea

a r t i c l e

i n f o

Article history: Received 5 April 2019 Received in revised form 31 December 2019 Accepted 7 January 2020 Available online 13 January 2020 Keywords: Character network Story analytics Computational narrative Story embedding Story2Vec

a b s t r a c t This study aims to learn representations of stories in narrative works (i.e., creative works that contain stories) using ﬁxed-length vectors. Vector representations of stories enable us to compare narrative works regardless of their media or formats. To computationally represent stories, we focus on social networks among characters (character networks). We assume that the structural features of the character networks reﬂect the characteristics of stories. By extending substructure-based graph embedding models, we propose models to learn distributed representations of character networks in stories. The proposed models consist of three parts: (i) discovering substructures of character networks, (ii) embedding each substructure (Char2Vec), and (iii) learning vector representations of each character network (Story2Vec). We ﬁnd substructures around each character in multiple scales based on proximity between characters. We suppose that a character’s substructures signify its ‘social roles’. Subsequently, a Char2Vec model is designed to embed a social role based on co-occurred social roles. Since character networks are dynamic social networks that temporally evolve, we use temporal changes and adjacency of social roles to determine their co-occurrence. Finally, Story2Vec models predict occurrences of social roles in each story for embedding the story. To predict the occurrences, we apply two approaches: (i) considering temporal changes in social roles as with the Char2Vec model and (ii) focusing on the ﬁnal social roles of each character. We call the embedding model with the ﬁrst approach ‘ﬂow-oriented Story2Vec.’ This approach can reﬂect the context and ﬂow of stories if the dynamics of character networks is well understood. Second, based on the ﬁnal states of social roles, we can emphasize the denouement of stories, which is an overview of the static structure of the character networks. We name this model as ‘denouementoriented Story2Vec.’ In addition, we suggest ‘uniﬁed Story2Vec’ as a combination of these two models. We evaluated the quality of vector representations generated by the proposed embedding models using movies in the real world. © 2020 Elsevier B.V. All rights reserved.

1. Introduction Various studies have been conducted to represent and analyze the stories found in the large number of narrative works (i.e., creative works that contain stories) regularly distributed through the web (e.g., Netﬂix and Youtube) [1–3]. ‘Story,’ as

*

Corresponding author. E-mail addresses: [email protected] (O.-J. Lee), [email protected] (J.J. Jung).

https://doi.org/10.1016/j.artint.2020.103235 0004-3702/© 2020 Elsevier B.V. All rights reserved.

2

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

used in these studies, indicates a creative product of humans; a combination of various connotative features (e.g., subject matters, backgrounds, metaphorical expressions, props, and so on), rather than a series of events (e.g., enumerated facts in news articles). Therefore, most of the existing models for representing stories [4–7] have consisted of semantics that is too ‘high-level’ to extract from the narrative work automatically. To resolve this issue, our previous studies [8–15] have focused on societies in stories. These studies have proposed methods for automatically measuring the intensity of social relationships between characters that appear in the narrative work [8,9,16,3]. Then, the previous studies constructed social networks among the characters, called ‘character network.’ The character network only focuses on the interaction between characters for representing stories. It is based on an assumption: ‘characters’ interact with each other in the ‘background,’ and these interactions compose ‘events.’ Thereby which we suppose that the character network reﬂects three major components of the story: the character, event, and background. According to John Truby [17], the interactions between the protagonist and its surrounding characters (mainly the antagonist) develop the story. Moreover, previous studies have demonstrated that we can use the character network to extract narrative elements (e.g., roles of characters [8,9], communities among characters [8,14], scene boundaries [10,11], major events [15], and so on) from narrative works with high accuracy. Although the character network volatilizes high-level story semantics, its performance shows that it can reﬂect the narrative characteristics of stories, at least partially. Obviously, we have to improve the semantic-richness of the character network model and have made such efforts [15,3]. However, this study does not cover this issue. To provide explainable services [18] based on the story, our previous studies [4,12–15,19] have attempted to compare stories with each other. Here, the most abstruse problems involved diﬃculties in building a generalized approach with mathematical foundations, since (i) the narrative work is made and consumed by human beings and (ii) character networks are streams of arbitrary-sized graphs [10]. Therefore, our previous studies on the character network [9,16,10–15] have proposed heuristic-based features according to their tasks. For example, the centrality of characters has been used for discriminating character roles [9,16]. Emotional states of characters have been applied in discovering the emotional relationships between characters and measuring the degree of conﬂict among them. [12,15]. Compactness and adjacency between communities of characters have been adopted to explain the structural similarity between character networks [14]. Although these features exhibited reliable performance, they are diﬃcult to generalize and expand to the various formats, genres, and media types of narrative work. These heuristic approaches make our studies not only diﬃcult to reuse but also discontinuous. Therefore, in this study, we adopt methods for learning distributed representations to construct ‘media-independent’ and ‘task-agnostic’ features of stories. We build vector representations of the character network using Graph2Vec [20], which is a well-known methodology for representing arbitrary sized-graphs as ﬁxed-length feature vectors. One of the most well-known methods for learning distributed representations is the Word2Vec model [21]. The skipgram and Continuous BOW algorithms have simple procedures for data represented in word-document relationships (i.e., multi-sets and their elements), and perform powerfully. Also, the Doc2Vec model [22] extends Word2Vec to represent arbitrary-length sequences of words. However, it is diﬃcult to directly apply these models to character networks because the character network is graphically represented. Therefore, we refer to the studies of Yanardag et al. [23] and Narayanan et al. [24,20]. They transformed graphically represented data to a word-document representation by applying the Weisfeiler-Lehman (WL) relabeling process [25]. Then, they adopted the Word2Vec model for learning distributed representations of substructures within graphs and represented each graph with a vector through the Doc2Vec model. Besides WL relabeling-based studies, various methods have been proposed for graphical data. Nevertheless, most of the existing graph embedding methods have focused on representing each node in the graph [26–30]. Also, since these methods mostly assign more similar representations to nodes with higher proximity, they are inappropriate for analyzing the story and character network. As opposed to social networks in the real world, higher proximity between characters does not indicate that the characters have more similar roles. Most of the embedding methods for structures of graphs are also inadequate for arbitrary-sized graphs [31]. Therefore, we apply the Subgraph2Vec [24] and Graph2Vec [20] models proposed by Narayanan et al., for learning vector representations of substructures of character networks and the character networks themselves, respectively. The substructures extracted by the WL relabeling reﬂect character connections at multiple scales. Thus, we suppose that these substructures represent the social roles of characters in each narrative work. Subsequently, based on the representations of the social roles, we obtain representations of character networks based on which social roles compose each character network. Effectively, a representation of a structure of each character network reﬂects a society described in a story. Thus, we call vector representations of the social role and character network as ‘character vector’ and ‘story vector,’ respectively. To represent each character and each story with a vector, we propose Char2Vec and Story2Vec models by modifying the Subgraph2Vec and Doc2Vec models. First, based on the WL relabeling process, we discover the substructures (social roles) of the character network, which are rooted in each character. By using the social role, we represent the character network as a multi-set of social roles. However, we modify the WL relabeling to consider the proximity between characters because it originally does not. We call this modiﬁcation the ‘Proximity-aware WL Relabeling Process.’ Subsequently, we aim to represent the social roles of each character with a vector. The Subgraph2Vec applied a modiﬁcation of the skip-gram algorithm [21], which we call ‘radial skip-gram.’ The skip-gram learns a ﬁxed number of neighborhoods of a prediction target; thus, it is inappropriate for substructures of graphs that do not have a ﬁxed number of

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

3

neighborhoods. Therefore, the radial skip-gram enables us to consider an unﬁxed number of surrounding contexts of substructures. We extend it more to consider temporal changes in social roles according to ﬂows of stories, which we call ‘temporal-radial skip-gram (TR-SkipGram).’ Based on the TR-SkipGram algorithm, we propose a model for vectorizing the social roles of characters, called ‘Char2Vec.’ Although in this study, Char2Vec is a prerequisite for representing stories of the narrative work, it can also be used to compare characters with each other. Through Char2Vec, we can represent the stories of each narrative work with a vector by applying the Graph2Vec and Doc2Vec models. The Graph2Vec model uses only the distributed bag-of-words version of the paragraph vector (PV-DBOW) method of the Doc2Vec. However, in this study, we apply both methods proposed in the Doc2Vec: distributed memory model of paragraph vector (PV-DM) and PV-DBOW. By extending the PV-DM, we train representations of stories by considering temporal changes in social roles. We call this method ‘temporal-radial CBOW (TR-CBOW),’ since we extend a range of neighborhoods of the PV-DM to consider temporal and radial adjacency of social roles. The PV-DM is also similar to the CBOW method in the Word2Vec model. Through the PV-DBOW, we focus on the kinds of social roles that appear in the denouement of each story. Based on these two methods, we propose a model for learning distributed representations of stories, called ‘Story2Vec.’ The contributions of this study are noted as follows:

• Proximity-aware WL relabeling process (Sect. 3.1.2): The WL relabeling discovers substructures of graphs by reassigning labels on nodes based on the labels of adjacent nodes. However, this method cannot reﬂect the proximity between nodes. Therefore, we assign more speciﬁed labels by considering the degree of proximity between characters. • Temporal-radial skip-gram/CBOW (Sect. 3.1.3 and Sect. 3.2.1): These models are extensions of the skip-gram and PV-DM methods, respectively. Since the Word2Vec and Doc2Vec models deal with textual data, they consider only words sequentially connected with subjects of learning, as neighborhoods of each subject. In this study, we extend the range of a neighborhood in three directions: (i) adjacency, (ii) scale, and (iii) temporal changes in social roles. • Char2Vec (Sect. 3.1): We vectorize social roles discovered by the proximity-aware WL relabeling through predicting temporal and radial neighborhoods of each social role (the TR-SkipGram). Thus, a character vector describes the social roles rooted in a character while considering the dynamicity of a story. • Story2Vec (Sect. 3.2): To learn distributed representations of stories, we apply two methods proposed in the Doc2Vec model: PV-DM and PV-DBOW. By using the TR-CBOW method, which is an extension of the PV-DM, a story vector is trained by predicting a social role based on its neighborhoods and the story vector. In the PV-DBOW method, we predict all the social roles that appear in a story from its vector representation. These two versions provide the story vector dynamic and static perspectives for the story, respectively. To verify the eﬃciency of the proposed models, we conducted the vector embedding for 142 real movies. We evaluated the accuracy of generated character vectors and story vectors by comparing them with user questionnaires. Based on experimental results, we attempted to verify the following research questions:

• RQ 1-1. A similarity between the substructures of character networks is related to a similarity between the social roles of characters (Sect. 4.1).

• RQ 1-2. Proximity among characters is signiﬁcant for discovering the social roles of the characters (Sect. 4.1), and temporal changes in these roles are effective for representing characters and stories (Sect. 4.2).

• RQ 2-1. Structural similarity between character networks is relevant to a similarity between corresponding stories (Sect. 4.3).

• RQ 2-2. A static structure and dynamic changes in the structure of the character network are signiﬁcantly useful for representing stories (Sect. 4.3). RQ 1-1 and RQ 1-2 examine whether the proposed modiﬁcations of the existing algorithms are effective for learning distributed representations of the story. The remaining two research questions debate whether our fundamental assumptions for representing stories are reasonable and practically useful. The remainder of this paper is organized as follows. In Sect. 2, we introduce underlying concepts and deﬁnitions of this study. We present the proposed method for learning distributed representations of the story in Sect. 3. In Sect. 4, we evaluate the proposed methods based on movies in the real world in a step by step manner. In Sect. 5, we introduce the related studies and compare them to the proposed models. Finally, we present the limitations of this study and future research directions in Sect. 6. 2. Problem description In our previous studies [16,10–15,3,32,33], we have attempted to represent and analyze stories based on social networks among characters contained in the stories. These studies share a common assumption that social relationships between the characters can represent characteristics of the story since they reﬂect all three major components of the story: the character, event, and background. Interaction between characters in the background causes events.

4

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

We make the fundamental assumption that the social network between characters (character network) consists of three elements: characters (nodes), their relationships (edges), and the degrees of proximity among the characters (weights on the edges) [8,9,16,15]. Various methods measure the degree of proximity: the number of dialogues exchanged between characters [9] and the length of co-occurrence time between characters [8,16]. Moreover, Tran et al. [10,11] have deﬁned the character network for each scene1 when considering the time-sequential characteristics of the story. We can generally deﬁne the character network as follows: Deﬁnition 1 (Character network [15,3,19]). Suppose that n is the number of characters that appear in a narrative work, Cα , and Cα consists of L scenes from sα ,1 to sα , L . When N (Cα ) indicates a character network of Cα , N (Cα ) can be described as a matrix ∈ Rn×n . Each element of N (Cα ) denotes a degree of proximity between two corresponding characters. By considering the time-sequential aspect of the story, N (sα ,l ) refers to a character network of an l-th scene, sα ,l . While N (Cα ) contains interaction among the characters, which occurred in the entire narrative work, N (sα ,l ) only includes until sα ,l . Thus, character networks of scenes represent a growth process of N (Cα ), and N (sα , L ) = N (Cα ). It can be formulated as:

⎡ ⎢

a1,1

. N (Cα ) = N (sα , L ) = ⎣ .. an,1

⎤ · · · a1,n .. ⎥ , .. . . ⎦ · · · an,n

(1)

where ai , j indicates the proximity of c i to c j when C α is a universal set of characters that appeared in Cα and c i is the i-th element of C α . Based on the character network, our previous studies have proposed various methods for comparing stories with each other [12–15]. Some of them have applied social network analysis (SNA) techniques [13,14], and the others have used external data [12,15]. Although these methods are based on logical foundations and intuitions from narratology studies, they are too heuristic to construct a generalized model for the various kinds and diverse formats of story and narrative works. Therefore, in this study, we propose a model for learning distributed representations of a story based on the Word2Vec [21] and Doc2Vec models [22]. It will enable us to calculate the similarity between two characters by merely using an inner product of their vector representations. Also, we can expect meaningful results from adding operations between the vector representations without knowing the semantics of each vector component. Word2Vec and Doc2Vec are eﬃcient and powerful models for data represented in terms of word-document relationships (i.e., multi-sets of discrete entities). However, the character network is a graphical model that consists of characters and their relationships. Therefore, we apply the WL relabeling process [25] to transform the character network to a multi-set of its substructures. As displayed in Fig. 1 (b) to (d), the substructure represents the ﬁrst-degree proximity of a rooted character. While it initially only describes local information, its coverage becomes broader as it iterates via the gathering of updated neighborhood information. Eventually, we may obtain substructures of the character network with multi-scaled coverage. We call the substructures of each character the ‘social roles’ of the character, because they provide the position and stance of the character in a particular scene at multiple scales. Suppose that c i , c j , and ck are the protagonist, antagonist, and minor character in a story, respectively. If ck has high proximity to both c i and c j , we can conjecture that ck has a role connecting c i and c j , although we do not know the exact semantics of the role (e.g., a negotiator or spy). Further, this enables us to represent a story as a multi-set of social roles. Thus, each character has multiple social roles according to each scene and each scale (degree). The social role is deﬁned as follows: (d)

(d)

Deﬁnition 2 (Social role). Let c i ,l be a social role of c i on sα ,l at a particular degree d ∈ [0, D ]. c i ,l , expressed by one-hop connectivity of c i at d − 1 degree. To consider the intensity of proximity, we categorize adjacent characters of c i into three groups based on proximity of c i for the adjacent characters. This is formulated in the following manner: (d)

(d−1)

c i ,l = c i ,l

(d)

(d) (d) (d) ; H c i ,l , M c i ,l , L c i ,l ,

(d)

(2)

(d)

where H c i ,l , M c i ,l , and L c i ,l indicate sets of social roles rooted in neighborhoods of c i at d − 1 degree. These sets include neighborhoods that receive high, medium, and low proximity from c i , respectively. Therefore, we can obtain distributed representations of social roles through the Char2Vec model. By using these representations, we can compare characters with each other based on their social roles, which can take many forms in the

1 The scene indicates the smallest unit of a story that has narrative signiﬁcance. A scene mostly describes an event that occurs among a group of characters. We used the term ‘scene’ since movies are our major experimental subject; it can be replaced with any appropriate unit according to the format and genres of the narrative work (e.g., a chapter in a novel, an episode of a TV series, etc.).

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

5

Fig. 1. Relationships between a narrative work, characters, scenes, character networks, and social roles. In all the subﬁgures, rectangular nodes refer to scenes, circular nodes indicate characters, and arrow lines denote social relationships between the characters. In (a), the narrative work (Cα ) consists of multiple scenes, and each scene contains characters and their interactions. Also, the character network of Cα (N (Cα )) is an accumulative representation for all scenes in Cα . In (b), a character network at a scene (N (sα ,l )) presents an accumulation of characters’ interactions from sα ,1 to sα ,l . Finally, (c) and (d) describe social roles of c i and c j , respectively, at sα ,l on d + 1 degree. Although the social role of c i on d + 1 degree only includes one-hop connectivity information on d degree, surroundings of c i on d degree also contain their one-hop connectivity on d − 1 degree. Thus, we can obtain broader substructures as the degree becomes higher.

progression of a single narrative work. Thus, we call the vector representation of the social role a ‘character vector.’ It is deﬁned as follows:

(d)

(d)

Deﬁnition 3 (Character vector). Let c i ,l be a distributed representation of c i ,l , and (·) denotes a projection function for vectorization. We can compare the social roles of characters by using conventional distance/similarity measurements for vectors, without understanding components of vector representations. This can be formulated as:

(d)

(d)

k c i ,l , c j ,l

= c i(,dl) , c (jd,l) ,

(3)

where k (·, ·) denotes a kernel function. According to Robert McKee [34,35], characters in a story are composed to cause events and escalate conﬂicts around its protagonist. Mostly, stories present processes of how their protagonists deal with or react to conﬂicts that jeopardize the normality of everyday life. Therefore, we assume that the composition of the social roles of characters in a story reveals not only the intention of authors/directors of the story but also the reaction of users to the story. Based on this assumption, we ﬁnally obtain distributed representations of stories using the Doc2Vec model. The story representations enable the comparison of combinations of social roles in each story. We call this distributed representation ‘story vector.’ This is deﬁned as follows:

Deﬁnition 4 (Story vector). Let N Cα be a distributed representation of a story in Cα . Closer locations of N Cα

and N Cβ indicate that N Cα and N Cβ have more similar structures to each other. We suppose that when Cα

and Cβ contain similar stories, N Cα and N Cβ are structurally similar. This is formulated as:

k Cα , Cβ k N (Cα ) , N Cβ

= ( N (Cα )) , N Cβ .

(4)

Although, strictly speaking, a story vector represents the structure of a character network, we anticipate that structural similarity between character networks can approximate similarity between stories. Additionally, Table 1 presents a list of notations that are frequently used in this article. To examine the expectation, we sampled 10 real movies from our dataset (which consists of the 142 real movies). In Sect. 3, we present how differently these movies are embedded by the proposed models. By discussing the correlations

6

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

Table 1 A list of notations that are frequently used. Notation

Description

Cα

the

Notation

Description

N (Cα )

sα ,l

the character network of Cα

the l-th scene of Cα

N (sα ,l )

ci

the character network of sα ,l

(d) N c i ,l

the i-th character

c i ,l

the social role of c i at sα ,l and on d degree a projection function for vector embedding

Sa

the a-th social role in the corpus

(·)

(d) P Sa c i ,l

α -th narrative work in the corpus

(d)

(d)

the set of neighboring social roles of c i ,l

(d)

the co-occurrence probability of Sa and c i ,l

Table 2 A list of the movies for comparing the vector representations. We selected the movies which were most frequently answered in the questionnaire survey (Sect. 4). Title

Release Year

Genres

Se7en

1990

Crime, Drama, Mystery

Misery

1983

Crime, Drama, Thriller

The Godfather

1972

Crime, Drama

Psycho

1960

Horror, Mystery, Thriller

Her

2013

Drama, Romance, Sci-Fi

Juno

2007

Comedy, Drama

Charade

1963

Comedy, Mystery, Romance

Marty

1955

Drama, Romance

Ghost

1990

Drama, Fantasy, Romance

Watchmen

2009

Action, Drama, Mystery

between vector representations and the social relationships of characters, we can infer the kinds of narrative characteristics reﬂected by each proposed embedding model. Table 2 presents a list of the selected movies. 3. Distributed representations of stories To represent the story in each narrative work with a vector, we suppose that the structure of a character network in the narrative work reﬂects the characteristics of the story. From the beginning of our studies on the character network, we have held the following proposition to be true: a story consists of characters, events, and background. Since the event is made by the interaction between the characters in the background, we can represent the story by describing the interaction. Based on this underlying assumption, we describe each story by (i) social relationships among characters, (ii) degrees of proximity in the social relationships, and (iii) temporal changes in the social relationships. To discover and represent character social relationships, we discover substructures of the character network, which are rooted in each character, by using the WL relabeling process [25]. WL relabeling enables us to obtain discrete and multi-scaled labels that represent social relationships. We call these labels the ‘social roles’ of characters. Subsequently, we represent each social role with a vector, based on Word2Vec [21] and Subgraph2Vec [24]. Through the vector representation, we can compare the social roles of characters not only among characters but also concerning temporal changes. Finally, we propose models for representing each story with a vector by applying the Doc2Vec [22] and Graph2Vec [20] models. Both models for representing the social role (Char2Vec) and story (Story2Vec) train the representations by using the adjacency and temporal change of social roles regarding characters in which the social roles are rooted. 3.1. Learning distributed representations of characters To represent a story with a vector, we refer to Word2Vec [36,21,37], Doc2Vec [22], Subgraph2Vec [24], and Graph2Vec [20]. To apply these models to the story, we model a relationship between the character and story as we would that between word and document. However, there is a signiﬁcant difference between these two relationships: the story is time-sequential, and characters keep changing according to the ﬂow of the story. Therefore, to preserve the time-sequential characteristic of the story, we model the story using the dynamic character network [10]. With this, a character appears in a stream of character networks, and substructures rooted in the character also change time-sequentially. To apply the word-document relationship in the narrative area, we suppose that a substructure of a character at a particular scene comprehensively represents the social relationships of the character in the scene. Thus, we call the substructure a ‘social role’ of the character. Thereby, where the character network of a narrative work N (Cα ) is a document, the character network of a scene N (sα ,l ) is a paragraph, and a substructure in N (sα ,l ) is a word. Based on this assumption, in this section, we propose a Char2Vec model, which is an extension of Subgraph2Vec for narrative work. This model consists of three main parts: (i) assigning nominal labels on nodes and edges (characters and social relationships, respectively) in the character network, (ii) discovering substructures (social roles) rooted in each character at every scene, and (iii) representing each social role with a vector.

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

7

Fig. 2. Steps of the proximity-aware WL relabeling process from d = 0 to 1, where c i , c j , ck , and cl appear in a common scene of a narrative work. Darkness of nodes and kinds of lines indicate the degree of centrality and proximity of characters, respectively. Step (a) and (b) indicate the discretization of the character network. Names of characters and weights on their relationships are replaced by several categories deﬁned in Sect. 3.1.1. Step (c) denotes assigning initial labels (at d = 0) on characters, as shown in Line 2 and 3 in Algorithm 1. Step (d) refers to describing labels on a subsequent degree (d = 1). These new labels are described by Eq. (7) and compressed by Eq. (6). Finally, in Step (e), we assign the new labels on d = 1 for all the characters. We iterate Step (d) and (e) until the maximum degree D.

For the ﬁrst task, we use a method based on signiﬁcant gaps in centrality and proximity of characters, which has been used in our previous studies (Weng et al. [8]). Subsequently, we propose a proximity-aware WL relabeling process based on the WL test of graph isomorphism [25]. Finally, we introduce the Char2Vec model and temporal-radial skip-gram method for representing each social role rooted in characters with a vector by modifying the radial skip-gram [24]. 3.1.1. Discretizing nodes and edges of the character network The character network has a few distinct characteristics from other social networks in the real world. These characteristics require specialized methods for analyzing the character network. One of the most abstruse characteristics is gaps between characters in their centrality and proximity (i.e., the intensity of social relationships among characters), which are signiﬁcant enough to be visible. These gaps are useful for categorizing characters and their relationships. For example, existing studies [8,9,16] have used the ﬁrst and second most signiﬁcant gaps in the centrality as boundaries between main, minor, and extra characters. Also, the protagonist of a story can be discovered by searching for the character with the highest centrality. Most of these studies measured the centrality using an average of three well-known centrality measurements: degree, betweenness, and closeness centrality [16]. This approach is not only eﬃcient and intuitive; it also demonstrated deﬁnitive accuracy in the previous studies [8,9,16,10,11,13–15]. Thus, before applying the WL relabeling on the character network, we assign nominal labels on each character and each relationship between characters, using the gaps. We categorize characters into four groups: protagonist ( P ), main (M), minor (m), and extra (e) characters. First, we measure proximity degrees between all characters in a narrative work based on the frequency of dialogues between the characters. Moreover, the degree of centrality of each character is estimated by the average of the three centrality measurements as with the previous studies [16,3], and all the characters in each narrative work are ordered according to their centrality degrees. Then, we calculate the gaps in the degrees of centrality between adjacently ordered characters (i.e., i-th and i + 1-th characters). As with the existing methods, we use the two most signiﬁcant gaps in centrality degrees as the boundaries between main, minor, and extra characters, respectively. The protagonist is detected by a character that has the highest centrality. As with the characters, we classify the social relationships of characters according to their intensity: high (H), medium (M), and low (L) proximity, based on the two most signiﬁcant gaps in proximity degrees among characters. Because the categorization of characters and their relationships has been introduced and tested in many previous studies, repeating detailed procedures and validations in this study may prove unnecessary. Fig. 2 (a) and (b) are before-and-after examples of discretizing the character network. In previous studies, we have used the names of characters to identify characters. Characters are unique entities in each narrative work, and through discretization, we can consider them in terms of their social roles, which compose societies described by narrative works. This enables us to compare stories delivered by narrative works regarding societies in the stories. 3.1.2. Discovering social roles of characters Other methods for distributed representations of graphs excluding studies of Narayanan et al. [24,20] mostly use weights between nodes directly for distance-based embedding [30,27–29,26]. Nevertheless, this approach is inappropriate for the character network. Even if two arbitrary characters have high proximity to each other, this does not mean that these two

8

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

characters are similar to each other; it only means that these two characters have a set of shared friends. This is a result of the societies expressed in stories being extremely small compared to real-world society. While narrative works reﬂect the real world, the focus of their stories is heavily on the protagonists of these stories, and societies in the stories are composed of the building conﬂicts and events around the protagonists. Therefore, we describe characters with (i) their centrality, (ii) centrality of their neighboring characters, and (iii) degree of proximity for the neighboring characters through the WL relabeling process, which lays the foundation for the WL kernel and WL test of graph isomorphism [25]. For c i , the relabeling method assigns a unique label that reﬂects relationships between c i and its neighboring characters. Since labels of all characters are reassigned in every iteration, the labels obtain more information about the structure of a whole character network according to the iterations. Therefore, a label assigned on the n-th iteration is called as a substructure found on n degree. The maximum degree of substructures (i.e., the number of iterations) is decided as the length of the longest non-cyclic path in the collected character networks. Before starting the WL relabeling process, we label characters according to the categories we made in the previous section: the protagonist, main, minor, and extra characters. While the labels are unique identiﬁers for kinds of substructures, we use natural numbers for convenience. This step is illustrated by Fig. 2 (c) and Line 2 and 3 in Algorithm 1. To apply the WL relabeling process in narrative work, we modify it to consider both the centrality of characters and the proximity between characters. The conventional WL relabeling process assigns a label of a character c i on d degree with a (d) sequence of labels on c i and its neighborhoods on d − 1 degree. When c j and ck are neighborhoods of c i and c i is a label (d)

of c i on d degree, c i (d)

ci

can be described as:

= c i(d−1) ; c (jd−1) , ck(d−1) . (d)

Subsequently, c i

(5)

is compressed based on a substructure dictionary. The dictionary consists of pairs of substructures and (d)

their identiﬁers, which are natural numbers. If we do not have c i

in our dictionary for substructures, we allocate a new

(d)

identiﬁer to c i . Otherwise, we assign an identiﬁer found in the dictionary. This can be formulated as:

⎧ (d−1) (d−1) (d−1) ⎨1 + max I (Sa ) , if c i ;cj , ck ∈ / S, (d) ∀Sa ∈S , c i := ( d − 1 ) ( d − 1 ) ( d − 1 ) ⎩I c ;c ,c , otherwise. i

j

(6)

k

where I (·) indicates a function to obtain identiﬁers of substructures, and S denotes a universal set of substructures that have appeared in our corpus of character networks. Nevertheless, this relabeling process does not consider the proximity between characters, although weights between characters (i.e., the degree of proximity) diverge widely in the character network. Therefore, we attempt to alleviate this issue by extending the deﬁnition of the substructure from adjacent substructures to adjacent substructures in each intensity of proximity. To categorize adjacent substructures according to proximity, we use categories (H, M, and L) deﬁned in Sect. 3.1.1. We redeﬁne a label of a substructure (Eq. (5)) as follows: (d)

ci

= c i(d−1) ; H (c i )(d) , M (c i )(d) , L (c i )(d) ,

(7) (d)

where H (c i )(d) , M (c i )(d) , and L (c i )(d) indicate sets of social roles neighboring with c i at d − 1 degree, which are connected by high, medium, low proximity, respectively. We call this modiﬁcation the ‘proximity-aware WL relabeling process.’ Fig. 2 (d) demonstrates describing and compressing labels using proximity-aware WL relabeling. Also, Algorithm 1 describes the entire procedure of the relabeling process from initialization to compression.

In Line 2 of Algorithm 1, C sα ,l indicates a set of characters that appeared in sα ,l . P sα ,l , M sα ,l , and m sα ,l de

note subsets of C sα ,l that include a protagonist, main, and minor characters, respectively. Similarly, in Line 9 and 11,

H sα ,l and M sα ,l refer to sets of social relationships between characters at sα ,l that have high and medium proximity, respectively. 3.1.3. Representing the social roles with a vector Based on the results of the WL relabeling process, we can represent a story in Cα with a multi-set of social roles of characters S Cα . This transformation enables a vector that consists of frequencies of social roles in S to represent a story. Nevertheless, this frequency vector contains several problems: (i) diﬃculty in reﬂecting similarity among social roles, (ii) high dimensionality, and (iii) diﬃculty in representing temporal transitions and adjacency between social roles. Therefore, we ﬁrst build distributed representations of social roles based on Word2Vec [21]. Then, we compose distributed representations of stories by applying Doc2Vec [22]. Through the distributed representation, we cannot only reduce the dimensionality of vector representations but also represent structural similarity among social roles and character networks. To apply Word2Vec and Doc2Vec on the character network, which is graphical data, we extend methods proposed in Subgraph2Vec [24] and Graph2Vec [20]. To train distributed representations based on occurrence probabilities of words, Word2Vec uses neighboring words with a ﬁxed-size window in both skip-gram and CBOW. However, in the graph, the

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

9

Algorithm 1 Proximity-aware WL Relabeling Process.

1: procedure WLreLABELLING( S , N sα ,l , C sα ,l )

2: for c i ∈ C sα ,l do

3:

4: 5:

⎧ 1, ⎪ ⎪ ⎪ ⎨ 2, (0) ci ← ⎪ 3, ⎪ ⎪ ⎩ 4,

if c i ∈ P sα ,l ⊂ C sα ,l ,

else if c i ∈ M sα ,l ⊂ C sα ,l ,

else if c i ∈ m sα ,l ⊂ C sα ,l , otherwise.

for d : 1 → D do

for c i ∈ C sα ,l do Set H (c i )( d) ← ∅, M (c i )(d) ← ∅, L (c i )(d) ← ∅ for c j ∈ C sα ,l , c i = c j do ai , j ← ai , j ∈ N sα ,l if ai , j ∈ H sα ,l then

6: 7: 8: 9:

(d−1) H (c i )(d) ← H (c i )(d) ∪ c j

else if ai , j ∈ M sα ,l then (d−1) M (c i )(d) ← M (c i )(d) ∪ c j

10: 11: 12:

(d−1) L (c i )(d) ← L (c i )(d) ∪ c j (d) (d−1) ci ← ci ; H (c i )(d) , M (c i )(d) , L (c i )(d) ⎧ ⎨1 + max I (Sa ) , if c i(d) ∈ / S, (d) ∀S a ∈S ci ← ⎩ I c (d) , otherwise. i

13: 14:

else

15: 16:

number of neighboring substructures is not constant. Thus, Narayanan et al. [24] proposed the radial skip-gram to consider ﬂexible-sized neighborhoods. Different from Subgraph2Vec and Graph2Vec, which deal with a set of independent graphs, a character network is a sequence of graphs that share nodes (characters). Therefore, we extend the radial skip-gram to consider the temporal continuity of the story. We call this extension TR-SkipGram. The radial skip-gram considers neighborhoods of a character c i at sα ,l from degree d − 1 to d + 1. Otherwise, TR-SkipGram considers neighborhoods of c i from sα ,l−1 to sα ,l+1 and from d − 1 to d + 1. Ranges of neighborhoods in those modiﬁcations of the skip-gram model are compared in Fig. 3 (a). The method for predicting the occurrence probability of a social role is the same with the conventional skip-gram and negative sampling method. For an arbitrary social role, its occurrence probability as a neighborhood of c i on sα ,l at degree d can be formulated as follows:

(d)

P Sa c i ,l

σ (Sa ) c i(,dl) ,

(8)

(d)

where c i ,l denotes a social role of c i on sα ,l at degree d, (·) indicates a projection function for vector representation, and σ (·) refers to the sigmoid function. By modifying an objective function of the negative sampling [21], we deﬁne an objective function for TR-SkipGram. We maximize the occurrence probability in Eq. (8) for the neighborhoods (Line 13 to 15 of Algorithm 2) and minimize for social roles that are not included in the neighborhoods (Line 16 to 19 of Algorithm 2). This is formulated as:

(d) LS c i ,l =

(d) ∀Sa ∈N T R c i ,l

(d) ∀Sa ∈N T R c i ,l 3

(d)

log P Sa c i ,l

−

log σ (Sa ) c i ,l

(d)

(d) ∀Sb ∈ / N T R c i ,l

+

k

(d)

log P Sb c i ,l

(9)

(d) ESb ∼ P n (S ) log σ − (Sb ) c i ,l ,

j =1

where P n (S ) ∝ U (S ) 4 denotes a noise distribution of social roles, U (S ) refers to a unigram distribution of social roles, and N T R (·) indicates a set of social roles that are in neighborhoods. The number of negative samples (k) is determined by the average size of neighborhood sets in our corpus. This objective function makes (Sa ) and (Sb ) closer to each other when Sa and Sb are neighboring. Otherwise, it makes them more distant. Additionally, we have followed the noise distribution of the Word2Vec model [21], since this study aims to verify whether we can represent a story with a vector rather than enhance its accuracy. We compose the neighborhoods of a social role with consideration of (i) adjacency, (ii) degree, and (iii) ﬂow of scenes. By considering multiple degrees, the character vector can reﬂect structures of character networks at multiple scales. By observing the temporal transitions of social roles, we can represent how the story is developed according to the ﬂow of scenes. We call neighborhoods of Subgraph2Vec, which only consider the adjacency and degree, ‘radial neighborhoods.’ Further, we name the neighborhoods of Char2Vec, which include temporal changes in social roles as ‘temporal-radial neighborhoods.’

10

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

Fig. 3. The TR-SkipGram of the Char2Vec Model. In (a), rectangular nodes indicate character networks of each scene at each degree; circular nodes refer to characters included in each character network, arrow lines denote the existence of social relationships between characters and their direction, and dashed rectangular boxes indicate ranges of neighborhoods of each skip-gram method. N R (·) is extended to N T R (·). In (b), a rectangular node denotes a target of the prediction and circular nodes indicate neighborhoods of the target social role.

Both neighborhoods are notated as N R (·) and N T R (·), respectively. However, we cannot assure that c i who appeared in sα ,l will also appear in sα ,l−1 and sα ,l+1 . Thus, we consider sα ,l+l only when it contains c i . Conclusively, an architecture of the TR-SkipGram method is illustrated by Fig. 3 (b). Additionally, whole procedures of Char2Vec can be described by Algorithm 2 and Algorithm 3. In these algorithms, the hyper-parameters (δ , ρ , and ) are empirically tuned, and detailed methods for tuning are presented in Sect. 4. SAMPLE(S , P n (S )) in Line 17 of Algorithm 2 indicates a procedure to obtain negative samples. Further, Line 9 of Algorithm 3 denotes shuﬄing a training dataset. Finally, Fig. 4 presents a visualization of vector embeddings of protagonists of the 142 movies in our dataset. To represent each protagonist with a single vector, we used an average of all the social roles rooted in the protagonist at every scene. To discover the narrative characteristics reﬂected by the Char2Vec model, we asked 6 faculty members in the Dept. of Film Studies of Chung-Ang University to qualitatively analyze correlations between movie stories and distance among movies in the projective space. The following is an aggregation of the analysis results. The conjectures that we have presumed based on the analysis are to obtain insights and intuitions for further research. To verify these conjectures, we will develop adequate evaluation procedures in our future works, which can persuade researchers from both sides: engineering and humanity. ‘Her (2013)’ and ‘The Godfather (1972)’ were closely located in the Char2Vec model. These movies have a common point that their stories mainly focused on a single protagonist (‘Theodore’ and ‘Michael Corleone’). An adjacency among ‘Misery (1983),’ ‘Ghost (1990),’ and ‘Marty (1955)’ is also signiﬁcant. Although these movies have different genres, their stories were commonly led by their protagonists and partners of protagonists (‘Annie Wilkes’ and ‘Paul Sheldon’ in ‘Misery (1983),’ ‘Sam Wheat’ and ‘Molly Jensen’ in ‘Ghost (1990),’ and ‘Marty Piletti’ and ‘Clara Snyder’ in ‘Marty (1955)’). Also, the protagonists had romantic relationships with their partners. Some movies were distant from each other. ‘Charade’ (1963) and ‘The Godfather’ (1972) had not only different genres but also a different number of characters that led their stories (we call these characters ‘leading characters’). Although ‘Charade’ (1963) focused on two characters that had romantic relationships, ‘The Godfather’ (1972) was protagonist-centric.

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

11

Algorithm 2 Temporal-Radial Skip Gram. (d)

1: procedure GetNEIGHBORS(c i ,l , N Cα ) 2: 3: 4: 5: 6:

(d)

Set N T R c i ,l ← ∅ for l : −1 → 1, 0 ≤ l + l ≤ L do for d : −1 → 1, 0 ≤ d + d ≤ D do for c j ∈ C sα ,l+ l , c i = c j do if N sα ,l+l ai , j = 0 then

(d) (d) (d+d) N T R c i ,l ← N T R c i ,l ∪ c j ,l+l

8: if N sα ,l+l ai ,i = 0 then ( d ) (d) (d+d) 9: N T R c i ,l ← N T R c i ,l ∪ c i ,l+l

(d) (d) 10: procedure TRskipGram(c i ,l , c i ,l , S , N Cα ) 7:

11:

ρ : the learning rate

(d)

← GetNEIGHBORS c i(,dl) , N Cα

12:

N T R c i ,l

13:

for Sa ∈ N T R c i ,l

14: 15: 16: 17: 18: 19:

(d)

do

(d) (d) L c i ,l ← − log P Sa c i ,l

(d) ∂ L c i ,l (d) (d)

c i ,l ← c i ,l − ρ (d) ∂ c i ,l

for 1 → k do (d) Sb ← SAMPLE(S , P n (S )), Sb ∈ / N T R c i ,l

(d) (d) L c i ,l ← log P S j c i ,l

(d) ∂ L c i ,l (d) (d)

c i ,l ← c i ,l − ρ (d) ∂ c i ,l

Algorithm 3 Learning Distributed Representations of Characters. 1: procedure Char2Vec(C, S ) 2: C : a universal set of narrative works 3: D: the maximum degree of the WL relabelling process 4: L: the number of scenes in each narrative work 5: δ : the number of dimensions for story vectors and character vectors 6: : the number of epochs 7: Initialization: Sample S from R|S |×δ , Set S ← {1, 2, 3, 4} 8: for e : 1 → do

9: ← SHUFFLE C 10: for Cα ∈ do 11: for l : 1 → L do

12: Sα ,l ← WLreLABELLING S , N sα ,l , C sα ,l 13: S ←S ∪S α ,l

14: for c i ∈ C sα ,l do 15: for d : 1 → D do

(d) (d) (d) 16: c i ,l ← TRskipGram c i ,l , c i ,l , S , N Cα

A relationship between ‘Juno (2007)’ and ‘Watchmen (2009)’ was similar to this. Contrary to the single protagonist of ‘Juno (2007)’ (‘Juno’), ‘Watchmen (2009)’ was an action movie that contained multiple heroes. This point is the same in the relationship between ‘The Godfather (1972)’ and ‘Watchmen (2009).’ These examples exhibited that a character vector of a protagonist generated by the Char2Vec model reﬂected the number of characters highlighted in a story. In other words, the Char2Vec model could detect relationships between the protagonist and other leading characters around the protagonist, which were as important as the protagonist was. If we can detect leading characters in a story (not only the protagonist), we can build a better representation of the story based on the Char2Vec model. 3.2. Learning distributed representations of the story In this section, we propose a method for representing a story based on the social roles of characters that appeared in the story. To represent the social roles of characters, we applied the two features: (i) social roles of adjacent characters and (ii) temporal changes in their social roles. For the vector representation of the story, we still use these features with the PV-DM proposed as a mode of Doc2Vec [22]. By using the PV-DM, the story vectors can reﬂect the two features mentioned above, which consider the dynamics of the story, through predicting each social role that appears in a story based on its neighborhoods and the story vector. However, this approach cannot consider ﬁnal states of social relationships between characters on the ‘denouement’ enough. In the case of movies, a user’s impression of a movie is mostly determined by the movie’s end scenes, which

12

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

Fig. 4. Vector representations of protagonists in narrative works that were composed by the Char2Vec model. Black dots denote narrative works that were selected for the qualitative discussion, and gray dots refer to the other narrative works in our dataset. Additionally, since we projected the vector representation (which are 32-dimensional) into 2-dimensions using t-SNE [38], the axes of this plot are not signiﬁcant.

includes the ﬁnal climax and denouement [34,35,39]. Thus, we also apply the PV-DBOW, which is another mode of Doc2Vec [22], for learning the composition of social roles on the ending part. Conclusively, the PV-DM based model learns all the scenes from sα ,1 to sα , L concerning

the temporal changes in social roles and their composition. Further, the PV-DBOW based model focuses only on N sα , L = N Cα . We call these two approaches ﬂow-oriented story vector (SV-F) and denouement-oriented story vector (SV-D). Furthermore, we propose a uniﬁed story vector (SV-U), which is an integration of the previous two approaches. 3.2.1. Story vectors based on ﬂows of stories In Char2Vec, the inputs are the social roles of a character at a particular scene and degree, and the outputs are occurrence probabilities of social roles as neighborhoods of the character. However, the SV-F operates in the opposite direction of Char2Vec since it is an extension of the PV-DM, which is similar to the CBOW method [21]. The SV-F model predicts the occurrence probabilities of each social role of a character in a particular context (scene and degree) based on the social roles of neighborhoods of the character and the story vector of the corresponding narrative work. Graph2Vec, which has similar purposes to Story2Vec, applied the PV-DBOW model. However, since the PV-DBOW model considers a graph as just a set of substructures, it is inappropriate for the character network, which is a sequence of graphs. Temporal transitions of social roles are not only signiﬁcant concerning a character but also regarding the composition of a character network. Therefore, we apply the PV-DM model, which can use temporal-radial neighborhoods of social roles, prior. Thereby, the SV-F model has a similar structure with the CBOW method [21], as displayed in Fig. 5. We call this learning method TR-CBOW. Therefore, the occurrence probability of a social role is estimated based on its neighborhoods and a story that contains the social role. A method for calculating this probability is similar to Eq. (8). Also, the methods for composing a set of neighborhoods are the same as the TR-SkipGram. Nevertheless, since this occurrence probability has multiple conditions (the neighborhoods and story), we must make a representative vector for the conditions. This is formulated as:

(d)

(d)

P c i ,l N T R c i ,l

(d) (d) N c i ,l , ( N (Cα )) σ c i ,l , ⎛

⎜ 1 N c i(,dl) = ×⎜ ⎝ ( N (Cα )) + T R (d) c i ,l + 1 N

(d) Sa ∈N T R c i ,l

(10)

⎞

⎟ (Sa )⎟ ⎠,

(11)

where N (·) denotes a representative vector for surrounding context including neighboring social roles and a story vector. N (·) is calculated by an average of representations of all the surrounding context. This prediction process and architecture of the TR-CBOW method is illustrated by Fig. 5. Conclusively, an objective function of the TR-CBOW method is a combination of the negative sampling and PV-DM. For a social role appearing in a target story, we maximize its occurrence probability predicted through Eq. (10) (Line 7 to 9 of

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

13

Fig. 5. The TR-CBOW of the Story2Vec-F model. A rectangular node indicates a target of the prediction, circular nodes denote neighborhoods of the target social role, and an elliptical node on left side refers to a story that contains these social roles.

Algorithm 4 Temporal-Radial Continuous BOW.

1: procedure TR-CBOW(S , S , N Cα , N Cα ) 2: L: the number of scenes in each narrative work 3: ρ : the learning rate 4: for l : 1 → L do 5: for d : 1 → D do

6: for c i ∈ C sα ,l do

9:

(d) (d) N T R c i ,l ← GetNEIGHBORS c i ,l , N Cα

(d) (d) L T R N Cα ← − log P c i ,l N T R c i ,l , N Cα

∂ L R N Cα

N Cα ← N Cα − ρ ∂T N C

10: 11:

for 1 → k do (d) Sb ← SAMPLE(S , P n (S )), Sb = c i ,l

7: 8:

12: 13:

α

(d) L T R N Cα ← log P Sb N T R c i ,l , N Cα

∂ L R N Cα

N Cα ← N Cα − ρ ∂T N C α

Algorithm 4) and minimize for other social roles (Line 10 to 13 of Algorithm 4). In the focus of the story, we predict the social roles of each character that appeared in the story at every degree and scene. This can be formulated as:

(d) (d) (d) L T R c i ,l = log P c i ,l N T R c i ,l , ( N (Cα )) −

(d)

log P Sb N T R c i ,l

, ( N (Cα ))

(d)

∀Sb =c i ,l

k

log σ c i(,dl) N c i(,dl) + ESb ∼ P n (S ) log σ − (Sb ) N c i(,dl) ,

L T R (Cα ) =

0≤d≤ D , c i ∈ C α , 1≤l≤ L N sα ,l ai ,i =0

(d) L T R c i ,l ,

(12)

j =1

(d) Sb ∈ / S (Cα ) = c i ,l c i ∈ C α , 0 ≤ d ≤ D , 1 ≤ l ≤ L , N sα ,l ai ,i = 0 .

(13)

(14)

Algorithm 4 and Algorithm 5 describe whole procedures of the TR-CBOW and SV-F, respectively. Line 7 of Algorithm 4 is conducted by the ﬁrst procedure of Algorithm 2. Contrary to Char2Vec, an input of the Story2Vec-F is a story in each narrative work (a sequence of character networks), and outputs are occurrence probabilities of social roles for characters that appeared in the story. Moreover, in Story2Vec models, we use pre-trained vector representations of social roles (character vectors), as described in Line 7 of Algorithm 5. Finally, Fig. 6 presents vector embeddings of the 142 movies that were composed by the SV-F model. We conducted a qualitative analysis for the story vectors embedded by the SV-F model in the same manner as the Char2Vec model. ‘Marty’ (1955), ‘Misery’ (1983), ‘Watchmen’ (2009), and ‘Psycho’ (1960) were closely located. In ‘Marty’ (1955) and ‘Misery’ (1983), the proximity between the two leading characters grew according to the ﬂow of their stories. The romantic relationship between ‘Marty Piletti’ and ‘Clara Snyder’ in ‘Marty’ (1955) contained both progress and conﬂicts.

14

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

Algorithm 5 Learning Distributed Representations of Stories. 1: procedure Story2Vec-F(C, D , δ, ) 2: 3: 4: 5:

C : a universal set of narrative works

6: 7: 8: 9: 10: 11:

Initialization: Sample C from R C ×δ S ← Char2Vec C, S for e : 1 → do

← SHUFFLE C for C α ∈ do

N Cα ← TR-CBOW S , S , N Cα , N Cα

D: the maximum degree of the WL relabelling process

δ : the number of dimensions for story vectors and character vectors

: the number of epochs

Fig. 6. Vector representations of stories in narrative works that were composed by the SV-F model.

In ‘Misery (1983),’ the intensity of conﬂicts between ‘Annie Wilkes’ and ‘Paul Sheldon’ grew because ‘Annie Wilkes’ exposed her insanity. Likewise, in ‘Watchmen’ (2009) and ‘Psycho’ (1960), the proximity between the protagonist and the neighboring characters of the protagonist grew. Nevertheless, in these two movies, social relationships evolved more complicatedly between multiple characters than in the previous case. ‘Watchmen’ (2009) started with protagonist-centric (Rhosharch) storytelling. Then, social relationships became complicated owing to the involvement of multiple heroes. However, in this movie, the intensiﬁcation of a confrontation between ‘Rorschach’ and its antagonist (‘Adrian Veidt’) was also distinct. This point was made visible by the fact that ‘Watchmen (2009)’ was more closely located to ‘Marty (1955)’ and ‘Misery (1983)’ than to ‘Psycho (1960).’ Lastly, although ‘Psycho (1960)’ showed an evolvement of complicated relationships among many characters around its protagonist (‘Norman Bates’), this movie was consistently protagonist-centric. Moreover, ‘Ghost’ (1990) and ‘Se7en’ (1990) were closely located. Both movies commonly started with two leading characters (‘Sam Wheat’ and ‘Molly Jensen’ in ‘Ghost’ (1990) and ‘David Mills’ and ‘William Somerset’ in ‘Se7en’ (1990)). Then, with the appearance of new characters, the number of highlighted characters was changed to three. In ‘Ghost’ (1990), a character (‘Oda Mae Brown’) became important, which connected ‘Sam Wheat’ with ‘Molly Jensen.’ In the case of ‘Se7en’ (1990), ‘John Doe’ started a relationship between ‘David Mills’ and ‘William Somerset’. Meanwhile, ‘Her’ (2013) and ‘Juno’ (2007) were distant. These two movies were commonly protagonist-centric. Nevertheless, there was a difference in terms of relationships between a protagonist and minor characters around the protagonist. ‘Her’ (2013) focused on describing the emotions and inner conﬂicts of its protagonist (‘Theodore’). In addition, ‘Theodore’ barely developed relationships with the surrounding characters. Otherwise, ‘Juno,’ which was the protagonist of ‘Juno’ (2007), actively interacted with various minor characters. These examples show that the SV-F model is sensitive to dynamic changes in leading characters. This notion is similar to the Char2Vec reﬂecting the number of leading characters. Nevertheless, the SV-F model seems close to representing dynamic changes in the social relationships of leading characters with other characters. At this moment, the change in relationships of characters is an ambiguous concept, and we need a more formalized deﬁnition to verify this assumption in future studies.

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

15

Fig. 7. The denouement-oriented TR-CBOW of the uniﬁed Story2Vec model. The upper part describes the PV-DBOW model used in the SV-D, and the lower part illustrates the TR-CBOW model of the SV-F.

3.2.2. Story vectors weighted on the denouement In the previous section, we presented a method for representing a story using temporal-radial adjacency between social roles. This approach is eﬃcient for the description of a story based on how characters and their social relationships develop according to the ﬂow and context of the story. However, character networks in the denouement of the story represent where the social relationships ﬁnally lead. Since a user’s impression of a story is mostly determined during a short period (i.e., from to the denouement) [34], for representing the story in the user perspective, structure and shape of the

ﬁnal climax

N sα , L = N Cα are signiﬁcant. Therefore, in this study, we propose a model for learning the ﬁnal state of the character network called the SV-D. Then, we present

a uniﬁed model for integrating the SV-F and SV-D, called

the SV-U. Based on the PV-DBOW, the SV-D trains N Cα through predicting all social roles discovered from N sα , L . This is similar to the skip-gram model. An objective function of the SV-D model can be deﬁned as Eq. (16) and Eq. (17), and the architecture of the SV-D model is illustrated in the upper part of Fig. 7. As displayed in Fig. 7, the temporal-radial approach gets more opportunities to train the story vector than the denouement-oriented approach since the SV-F predicts the occurrence probability of every social role at every scene. We call this uniﬁed learning method, ‘denouement-oriented TR-CBOW (DTR-CBOW).’ In the SV-D model, an occurrence probability of a social role is predicted by an inner product of a character vector and a story vector, similar to in Eq. (8). This can be formulated as:

(d)

(d)

P c i , L ( N (Cα )) σ c i , L

( N (Cα )) .

(15)

A difference between Char2Vec (TR-SkipGram) and SV-D is that the latter uses all the social roles that appear in a story as the surrounding context of the story vector. An objective function of the SV-D model is deﬁned by combining the negative sampling and PV-DBOW. For a story,

we maximize an occurrence probability of every social role in N sα , L at every degree (Line 8 and 9 of Algorithm 6) and

minimize the probability for social roles that did not appear in N sα , L (Line 10 and 13 of Algorithm 6). This is formulated as:

(d) (d) L D c i , L = log P c i , L ( N (Cα )) −

∀Sb ∈ / S sα , L

log P Sb ( N (Cα ))

k

! log σ c i(,dL) ( N (Cα )) + ESb ∼ P n (S ) log σ − (Sb ) ( N (Cα )) , j =1

(16)

16

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

Algorithm 6 Denouement-oriented Temporal-Radial Continuous BOW.

1: procedure DTR-CBOW(S , S , N Cα , N Cα ) 2: L: the number of scenes in each narrative work 3: ρ : the learning rate 4: for l : 1 → L do 5: for d : 1 → D do

6: for c i ∈ C sα ,l do 7: if l = L then

(d) 8: L D N Cα ← − log P c i ,l N Cα

∂L N C N Cα ← N Cα − ρ ∂D N C

α

9:

α

for 1 → k do

Sb ← SAMPLE(S , P n (S )), Sb ∈ / S sα , L

10: 11:

16:

L D N Cα ← log P Sb N Cα

∂L N C N Cα ← N Cα − ρ ∂D N C

α α

(d) (d) N T R c i ,l ← GetNEIGHBORS c i ,l , N Cα

(d) (d) L T R N Cα ← − log P c i ,l N T R c i ,l , N Cα

∂ L R N Cα

N Cα ← N Cα − ρ ∂T N C

17: 18:

for 1 → k do (d) Sb ← SAMPLE(S , P n (S )), Sb = c i ,l

12: 13: 14: 15:

α

(d) L T R N Cα ← log P Sb N T R c i ,l , N Cα

∂ L R N Cα

N Cα ← N Cα − ρ ∂T N C

19: 20:

α

Fig. 8. Vector representations of stories in narrative works that were composed by the SV-D model.

L D (Cα ) =

(d) L D ci,L ,

0≤d≤ D c i ∈C α

(17)

where S sα , L indicates a set of social roles discovered from N sα , L . To integrate the SV-F and SV-D models, we deﬁne an objective function of the DTR-CBOW method as a summation of the objective functions of the SV-F and SV-D. This is formulated as the following:

LU (Cα ) = L T R (Cα ) + L D (Cα ) .

(18)

Algorithm 6 describes the overall procedures of the DTR-CBOW. Line 7 to 13 of Algorithm 6 presents a learning method of the SV-D, which learns only on the denouement. Also, Line 14 to 20 describes an SV-F part in the SV-U, which is with the same as in Algorithm 4. The SV-D and SV-U, will have the same procedures as the SV-F in Algorithm 5 without the learning method. Finally, Fig. 8 and 9 present vector embeddings of the 142 movies that were composed by the SV-D and SV-U models, respectively.

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

17

Fig. 9. Vector representations of stories in narrative works that were composed by the SV-U model.

From Fig. 8, we can see a relationship between the static structure of a character network and the story in a movie. For adjacent movies in the SV-D model, two pairs were signiﬁcant. ‘Psycho’ (1960) and ‘Misery’ (1983) were commonly thriller movies, but also protagonists of these two movies were left alone in the conclusion. ‘Norman Bates’ in ‘Psycho’ (1960) was arrested and imprisoned in the end. ‘Misery’ (1983) ended with ‘Paul Sheldon’ escaping after he murdered ‘Annie Wilkes.’ In the other case, ‘Se7en’ (1990) and ‘Marty’ (1955)’ have denouements that focused on two characters. In the last part of ‘Se7en’ (1990), ‘David Mills’ murdered a criminal ‘John Doe’ and was arrested. However, in the end, the movie implied that ‘William Somerset’ would not leave ‘David Mills’ in a scene where ‘William Somerset’ stared at ‘David Mills.’ ‘Marty’ (1955) ﬁnished with a happy ending for the romantic relationship between ‘Marty Piletti’ and ‘Clara Snyder.’ Moreover, these two pairs were located on opposite sides, and romance movies (‘Charade’ (1963), ‘Juno’ (2007), and ‘Ghost’ (1990)), which maintained two leading characters across their stories, were closely located to ‘Se7en’ (1990) and ‘Marty’ (1955). These cases demonstrated that the SV-D model represented a composition of leading characters as with the Char2Vec and SV-F models. However, the SV-D model looked more sensitive to genres of movies and conclusions of stories in the movies than the Char2Vec and SV-F models. We conjectured that this point comes from the SV-D model considering every social role in a ﬁnal state of a character network. As displayed in Fig. 6 and 9, the results of the SV-U model were mostly similar to those of the SV-F model. Nevertheless, the following examples for distant movies were more identiﬁable in the SV-U model than in the SV-F model. A distance between ‘The Godfather’ (1972) and ‘Watchmen’ (2009) was apparent. Although the two movies are of different genres (crime and action movie), ‘The Godfather’ (1972) was also protagonist-centric, and ‘Watchmen’ (2009) concentrated on more characters than ‘The Godfather’ (1972). In ‘Watchmen’ (2009), the antagonist (‘Adrian Veidt’) was more important and deﬁnite than in ‘The Godfather’ (1972). Moreover, a distance between ‘Charade’ (1963) and ‘Marty’ (1955) was visible. Though the two movies commonly had features of romance movies, ‘Charade’ (1963) contained characteristics of thriller and mystery movies. Nevertheless, the only meaningful conclusion we could derive from our analysis is that the SV-U model is excessively close to the SV-F. We will verify two simple conjectures in Sect. 4.3: (i) the SV-D model can reﬂect genres of narrative works and (ii) the SV-U model is much more affected by the SV-F model than the SV-D model. Our qualitative analysis subjects are too few to verify any conjectures yielded by the analysis. Thus, this study only applied the qualitative analysis to explain our approaches and the way they work in real movies. In further research, we will attempt to bring procedural rationality to the qualitative analysis of stories and narrative works. 4. Evaluation In this section, we evaluate the eﬃcacy of the proposed methods and models in three steps: (i) discovering social roles, (ii) composing neighborhoods, (iii) representing characters and stories with vectors. In Sect. 4.1, we verify whether the proposed concept of the social role reﬂects the roles of characters regarding RQ 1-1. Based on RQ 1-2, the eﬃciency of the proximity-aware WL relabeling process is evaluated in Sect. 4.1, and the effectiveness of temporal-radial neighborhoods is veriﬁed in Sect. 4.2. Finally, we verify whether the structural features of character networks in a story can represent the story according to RQ 2-1, in Sect. 4.3. Subsequently, we compare the proposed embedding models to validate RQ 2-2.

18

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

To evaluate the eﬃcacy of the proposed methods and models, we mainly applied vector representations composed by the proposed models to the measurement of similarity degrees between stories in real movies. Then, we compared the story similarity of movies, which is automatically measured, with similarity estimated by human cognition. To measure the similarity, we used a combination of cosine similarity and Euclidean distance between vector representations. This can be formulated as follows:

kˆ Cα , Cβ = S cos ( N (Cα )) , N Cβ

"

#−1 × D ( N (Cα )) , N Cβ +1 ,

(19)

where S cos (·, ·) indicates the cosine similarity and D (·, ·) denotes the Euclidean distance. In the case of a character vector, we aggregated the vector representations of social roles rooted in each character. For the aggregation, we averaged the vectors of all the social roles rooted in the character at every scene. Further, because the movies contain too many characters to verify one-by-one, we only used protagonists of the movies to measure the similarity between the movies. For the experiment, we ﬁrst acquired scripts and metadata of 142 movies from IMSDb2 and IMDb,3 respectively. We extracted character networks from the movie scripts by using CharNet-Extractor.4 Since not all the scripts in IMSDb are intact, we collected movies with scripts that are not defective and evenly distributed across various genres. To implement the proposed models, we referred to open-source implementations of the Graph2Vec5 and Subgraph2Vec.6 A list of collected movies and an implementation of the proposed models are also accessible on GitHub.7 Subsequently, to compose the ground truth, we collected human cognition for the story similarity. As we discussed in our previous study [15], since the story is a subjective and intuitive area, it is challenging to build invariable ground truth. The story and narrative work that we deal with are creative and artistic products. In the literature and narratology areas, there are various methods for analyzing and comparing narrative works regarding their stories. However, these methods are diﬃcult to be systematized as explicitly as a nonprofessional can follow. Although there have been various studies on the narrative formalism [40], they are still informal and equivocal in engineering perspectives. Furthermore, this study aims to emulate the cognition of regular users for stories rather than criticism of experts- most regular users ‘feel’ similarity of stories without seriously considering the logical evidence. Also, for the users, it is an abstruse task to quantize the similarity of stories with consistent criteria. Even though we asked the regular users to quantize the similarity of movies regarding their stories, we could neither deﬁne the similarity between stories clearly nor the procedures for measuring similarity. We resolved this issue by collecting inequality relationships between cognitive similarity of stories in real movies from the regular users. Directly comparing degrees of the similarity might have been much easier

for the users than

quantifying

the similarity. Data that we have collected from the users are in the form of k Cα , Cβ = max(k Cα , Cβ , k Cβ , Cγ , k(Cγ , Cα )). We composed a group of human evaluators that consisted of 50 people who have been faculty members and students of Chung-Ang University. To collect similarities between movies from the evaluators, we developed a web application.8 Fig. 10 displays user interfaces of the web application. As a question, this application provided three movies to an evaluator with a request: “Select two movies you think are similar.” The three movies were randomly chosen from the 142 movies in our movie corpus. Then, the evaluator selected two of the three movies that are more similar to each other than another one. The evaluators could discard movies provided by the application if they could not answer (e.g., unwatched movies). Simply speaking, when we gave three narrative works (Cα , Cβ , and Cγ ) to an evaluator, the evaluator had three choices: (Cα , Cβ ), (Cβ , Cγ ), and (Cγ , Cα ). From the 50 human evaluators, we collected answers for 1,471 questions. To compare the similarity from our embedding models with these collected inequality relationships, we had to integrate and quantify the inequality relationships. As displayed in Fig. 11, each evaluator answer includes two inequality relationships between We

similarities.

can con

catenate inequality relationships by using terms shared between them; e.g., {k Cα , Cβ > k Cα , Cγ } ∧ {k Cα , Cγ >

k Cα , Cδ } ⇒ k Cα , Cβ > k Cα , Cγ > k Cα , Cδ . By concatenating all possible relationships, we can determine which similarity value should be closer to 1 than others. For the concatenation, we transformed the collected data into a graph, where the similarity corresponds to nodes, and the inequality relationships are directed edges between them. We composed this ‘cognitive similarity graph’ for each evaluator, as displayed in Fig. 11. Then, we quantiﬁed each similarity based on its depth on the cognitive similarity graph. When D indicates depth of (Cα , Cβ ) on a cognitive similarity graph of an

evaluator em , a similarity degree of em for Cα and Cβ (km Cα , Cβ ) is quantized as D1+1 . Finally, when Kα ,β indicates a set of similarity degrees for Cα and Cβ estimated by the evaluators, cognitive similarity data from all the evaluators are

$ integrated by averaging all the collected similarity degrees as: k Cα , Cβ = |K1 | × ∀km Cα ,Cβ ∈Kα,β km Cα , Cβ . α ,β

2 3 4 5 6 7 8

https://www.imsdb.com/. https://www.imdb.com/. https://github.com/O-JounLee/CharNet-Extractor. https://github.com/MLDroid/graph2vec_tf. https://github.com/MLDroid/subgraph2vec_tf. https://github.com/O-JounLee/Story2Vec. http://recsys.cau.ac.kr:8084/movies/.

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

19

Fig. 10. User interfaces of the web application for collecting the cognitive similarity from the evaluators. This application provides three movies that are randomly chosen from our movie corpus, which currently consists of 142 movies. If there are movies that the evaluators have not ever watched, they can ‘refresh’ the movies. Otherwise, the evaluators ‘select’ the two movies that have more similarities than the third. After the selection, the evaluators can ‘submit’ their answers. Additionally, the evaluators can also insert reasons for their selections. However, we could not use the reasons for our experiments, since the evaluators barely inserted the reasons.

Fig. 11. An example for concatenating inequality relationships among similarity degrees collected from an evaluator, when the evaluator selected (Cα , Cβ ) from (Cα , Cβ , and Cγ ) and (Cβ , Cγ ) from (Cβ , Cγ , and Cδ ). The dotted rectangular box refers to the question and answer for (Cα , Cβ , and Cγ ).

To evaluate the performance of our embedding models, we calculated mean absolute errors (MAEs) between automatically measured similarity degrees and the cognitive similarity from the evaluators. This can be formulated as:

M AE =

1

|K |

×

∀k Cα ,Cβ ∈K

|kˆ Cα , Cβ − k Cα , Cβ |,

(20)

where K denotes a set of similarity degrees estimated by the evaluators. Furthermore, the proposed models require various hyper-parameters. Although this study focuses on validating the feasibility of representing stories with vectors rather than enhancing the accuracy of the vectors, we present our approaches to determining the hyper-parameters for the reproducibility of this study. Therefore, we performed a hyper-parameter search on: the number of epochs (20 to 150 with a step of +10), the learning rate ρ (0.0025 to 0.25 with a step of ×10), and the number of dimensions δ (10 to 50 with a step of +2). By evaluating each case based on average accuracy of the proposed models (i.e., MAE for all the SV-F, SV-D, SV-U, and Char2Vec), we determined , ρ , and δ as 80, 0.025, and 32, respectively. Also, we proposed methods for determining a few hyper-parameters: the maximum degree of social roles D (in Sect. 3.1.2) and the number of negative samples k (in Sect. 3.1.3). We decided D = 4 and k = 31 = 31.45 based on the

20

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

Fig. 12. Experimental results for MAE of the proposed embedding models (SV-F, SV-D, SV-U, and Char2Vec) according to hyper-parameters D and k, which indicate the maximum degree of social roles and the number of negative samples, respectively.

length of the longest non-cyclic path in character networks and the average size of neighborhoods in our dataset, respectively. To verify these methods, we conducted a simple experiment for evaluating possible D (1 to 4 with a step of +1) and k (11 to 41 with a step of +2). As displayed in Fig. 12, although the proposed models exhibited the lowest MAE at D = 4 and k = 36, there were no signiﬁcant difference between the decisions of the proposed methods (at D = 4 and k = 31, M A E: 0.194) and the local optimum from the experiment (at D = 4 and k = 37, M A E: 0.191) in terms of MAE. Additionally, we have merely followed the noise distribution ( P n (S )) of the Word2Vec model [21]. We will ﬁnd an optimal noise distribution for embedding characters and stories in our future work. 4.1. Validity of the social roles of characters Based on RQ 1-1 and RQ 1-2, we evaluated the concept ‘social role’ that we newly proposed in this study. RQ 1-1 deals with that the proposed deﬁnition (Deﬁnition 2) of the social role is valid. We veriﬁed RQ 1-1 by evaluating relevancy between substructures in character networks and the social roles of characters. Subsequently, RQ 1-2 covered that proximity among characters is valuable for discovering social roles. To validate RQ 1-2, we compared the accuracy of the proposed models in two cases when they apply the proximity-aware WL relabeling and the conventional WL relabeling. 4.1.1. Relevancy between substructures of character networks and social roles of characters RQ 1-1 corresponds to the fundamental assumption of this study that a substructure rooted in a character signiﬁes a social role of the character. To verify this assumption, we collected characters that had the same substructures from the movies in our dataset. Then, we asked the evaluators whether the characters with the same substructures had the same roles. We gathered the characters only based on static character networks at the denouement. Due to the diversity of characters and substructures rooted in them, we could not ﬁnd characters with the same substructures if we consider their substructures at every scene. Also, for the evaluators, it is diﬃcult to remember all the characters, including extras. Thus, we only used protagonists and main characters. Eventually, we obtained 12 groups of characters (G1 to G12 ) which have the same substructures on every degree (from 0 to 4), and these groups consist of 3.08 characters on average. To verify the assumption, we compared the characters with identical substructures with the whole characters, which are uncontrolled. We asked the evaluators to answer six questions for whether two characters have the same role. For example, “Do ‘Don Vito Corleone’ in ‘The Godfather’ (1972) and ‘Shifu’ in ‘Kung Fu Panda’ (2008) have the same role in the respective movies?” Four of the six questions presented randomly selected characters, which are protagonists or main characters in movies that an evaluator has watched. Also, for the other two questions, we asked for characters that have identical substructures. We did not ask the evaluators to make answers according to any speciﬁc criteria. As discussed for collecting the cognitive similarity, it is challenging for regular users to analyze stories with consistent criteria. Therefore, we could not employ the existing taxonomies of roles [40] that generally consist of heroes, mentors, negotiators, spies, etc. Based on answers for the questions, we could measure the accuracy of the two cases by a ratio of the number of positive answers for the number of questions in each case. The accuracy of random characters and characters with the same substructures were 0.21 and 0.84, respectively. Table 3 presents the accuracy of each character group which has identical substructures. Then, we conducted a one-sample t-test to verify whether applying the equivalence of substructures improved the accuracy signiﬁcantly. As a result of the t-test, we obtained a p-value = 1.170 × 10−8 < 0.5 × 10−7 . This result means that the identical substructures indicate the same roles of characters in signiﬁcantly high accuracy with a signiﬁcance level of 0.5 × 10−7 . This experimental result supported the relevance of the connection between substructures rooted in characters and the roles of characters. Although the substructure enables us to discriminate the roles of characters more semantically than merely using the centrality of characters [8,9,16], a scale of this experiment should be extended for a concrete validation. Also, we will attempt to reveal correlations between the social roles and the existing role taxonomies in the narratology.

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

21

Table 3 Experimental results for verifying the relevance between substructures and social roles of characters. This table presents the experimental results for each group. # of Q and # of Char denote the number of questions for each group and the number of characters in each group, respectively. Lastly, ‘Random’ indicates randomly selected pairs from all the protagonists and main characters. Our corpus includes 607 protagonists and main characters, and the evaluators answered for 113 of them.

G1

G2

G3

G4

G5

G6

G7

G8

G9

G10

G11

G12

Overall

Random

Accuracy

0.86

0.86

1.00

0.80

0.75

0.86

0.86

0.71

1.00

0.75

0.50

1.00

0.84

0.21

# of Q

21

14

10

10

8

7

7

7

5

4

4

3

100

200

# of Char

6

5

5

3

3

3

2

2

2

2

2

2

37

113(607)

Table 4 Experimental results to evaluate the eﬃciency of the proximity-aware WL relabeling process and temporal-radial neighborhoods. This table presents the effects of the two proposed methods for the proposed embedding models: SV-F, SV-D, SV-U, and Char2Vec (CV). Each cell indicates an MAE of each model (on the ﬁrst row) according to cases depicted in the ﬁrst and second columns. Also, */RCV indicates a case when we applied character vectors that were trained by the radial skip-gram on the Story2Vec models. The SV-D and SV-D/RCV have blanks on rows for R-N and TR-N, respectively, since they do not use the neighborhoods for embedding stories.

TR-N

R-N

Average

SV-F

SV-F/RCV

SV-D

SV-D/RCV

SV-U

SV-U/RCV

CV

Average

P-WL

0.21

0.26

0.14

–

0.18

0.31

0.24

0.22

WL

0.45

0.48

0.19

–

0.39

0.41

0.36

0.38

P-WL

0.49

0.47

–

0.23

0.44

0.37

0.44

0.41

WL

0.36

0.37

–

0.20

0.28

0.35

0.28

0.31

0.38

0.39

0.17

0.22

0.32

0.36

0.33

0.33

4.1.2. Eﬃciency of the proximity-aware WL relabeling process In the previous section, we validated the RQ 1-1, i.e., substructures of character networks reﬂect the social roles of characters. Subsequently, we compared the eﬃciency of the proximity-aware WL relabeling process with the conventional method, to verify the ﬁrst half of the RQ 1-2 assumption that proximity among characters is signiﬁcant for discovering the substructures. To evaluate the eﬃciency, we measured the accuracy of the proposed models for measuring the similarity of stories in movies in two cases where the proximity-aware WL relabeling (P-WL) and the conventional WL relabeling (WL), were respectively used. We conducted experiments for cases where we applied the temporal-radial neighborhoods (TR-N) or radial neighborhoods (R-N), together. These various cases demonstrate not only the effects of the two methods proposed for the Char2Vec and Story2Vec models but also their effects on each other. Table 4 displays the accuracy of the proposed embedding models according to the cases. By comparing the even-numbered and odd-numbered rows of Table 4, we can verify whether considering the proximity improves the quality of character and story vectors. This veriﬁes RQ 1-2 in a roundabout way. Although in the case P-WL/TR-N, all the embedding models exhibited the best performance, P-WL did not outperform WL in all the cases. Comparing the four cases in order from worst to best, we got P-WL/R-N, WL/TR-N, WL/R-N, and P-WL/TR-N, on average. For each model, although there were differences in the order of P-WL/R-N and WL/TR-N, P-WL/TR-N and WL/R-N exhibited the ﬁrst and second highest accuracy in all the models. We speculate that these phenomena occur because the P-WL increases the total number of social roles and the TR-N expands the range of neighborhoods. Simply speaking, P-WL/R-N causes under-ﬁtting, and WL/TR-N causes over-ﬁtting. To verify this conjecture, we compared the number of social roles discovered by the P-WL and WL, respectively. With the P-WL, we found 37,631 social roles from 142 movies. On an average, we discovered 350.8 social roles in a movie. The ratio of duplicated social roles among the movies for all social roles was only 21.7%. However, in the case of the conventional WL, we discovered 2,687 social roles in the same conditions (14.0 times less than P-WL). In addition, there were 57.4 social roles for each movie, and 67.0% of social roles appeared in multiple movies. This result indicates that compared to conventional methods, the P-WL too ﬁnely distinguished substructures of character networks. Since the proposed models learn distributed representations of characters and stories based on adjacency of social roles, overly ﬁne substructures can hinder the quality of the character vectors and story vectors. Comparing the proposed models, the SV-D model exhibited the highest accuracy in all the cases. In Story2Vec models, the SV-U and SV-F models were the second and third ones, consistently. Detailed comparisons between the proposed models will be presented on Sect. 4.3. Nevertheless, the interesting point was that the above-mentioned issues (in P-WL/R-N and WL/TR-N) cause signiﬁcantly fewer performance decrements in the SV-D and Char2Vec models than in the SV-F and SV-U models. The SV-D model does not use neighborhoods to learn story vectors. Thus, even though this model can be affected by the accuracy of character vectors, it is relatively free from the under/over-ﬁtting issues. Second, excluding the P-WL/TR-N case, the Char2Vec model outperformed the SV-F and SV-U models. We speculate the reason for this is that movie protagonists are more typical than other characters. By the P-WL, we discovered 4,370 substructures in total and 43.0 substructures on an average that rooted in protagonists (duplication ratio: 28.4%). On the other hand, we found 418 substructures in total and

22

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

11.0 substructures on average based on the conventional WL relabeling (duplication ratio: 75.4%). Although there was still a signiﬁcant gap between the two relabeling methods, we can see that the duplication ratios were higher for the protagonists than for all the characters. The experimental results exhibited both strong and weak points of the proximity-aware WL relabeling process. By considering the proximity, the proposed embedding models could minutely distinguish the social roles of characters. This sensitivity helped to embed characters and stories when we used the TR-N, together. The proximity-aware method discovered too many social roles with too low duplication ratio. Although improving the scale of the dataset can improve the duplication ratio, we will look for fundamental solutions in further studies. First, we should develop a method for controlling granularity in discovering social roles. The proposed embedding models also have to be improved to provide enough learning opportunities regardless of the sparsity of social roles. 4.2. Eﬃciency of temporal-radial neighborhoods In this section, we verify that the temporal-radial neighborhood is more appropriate for embedding characters and stories than the radial neighborhood. In order to validate a remaining half of RQ 1-2 that temporal changes in social roles are signiﬁcant for representing characters and stories. In the previous section, we supposed that the proximity-aware WL relabeling and the temporal-radial neighborhood affect the performance of each other, and the experimental results (in Table 4) accorded with this assumption. To evaluate the eﬃciency of applying temporal changes in social roles for embedding characters and stories, we discuss the experimental results in terms of whether TR-SkipGram and TR-CBOW outperformed the radial skip-gram (R-SkipGram) and the radial CBOW (R-CBOW); i.e., the PV-DM with radial neighborhoods. Additionally, as our story embedding models are conducted on pre-trained character vectors, we exhibited cases where they operated on character vectors that were composed of the radial neighborhoods. A comparison between the TR-SkipGram and R-SkipGram is presented in the ninth column (CV) of Table 4. In addition, for the TR-CBOW and R-CBOW, we can mainly see in the third column (SV-F). As the tendencies of their performance were the same as what we discussed in Sect. 4.1, there were no more points for further discussion. For the under/over-ﬁtting issues in P-WL/R-N and WL/TR-N, we discussed in the previous section, enough. Similar to the number of social roles in the P-WL and WL, we found that the TR-N (31.45) included 2.8 times more social roles than the R-N (11.18), on average. This point supports our assumptions for the issues; for learning more social roles, more neighborhoods were required. There was an interesting point for the effects of character vectors with R-N for the Story2Vec models. In P-WL/R-N and WL/TR-N, all the proposed models (excluding the SV-D) exhibited severely low performance. In our experiments, similarity degrees and their errors were measured in a range of [0, 1]. Thus, the MAEs near 0.4 were too high to determine two similar narrative works as dissimilar ones and vice versa. Nevertheless, in WL/R-N, a performance decrement of the Char2Vec model was relatively better than others (between P-WL/TR-N and WL/R-N, SV-F: 0.15, SV-F/RCV: 0.11, SV-U: 0.10, SV-F/RCV: 0.05, Char2Vec: 0.04). Further, these decrements were worse in the SV-F and SV-U than when they operated with the RCV. This point is contrary to that the SV-F and SV-U outperformed the SV-F/RCV and SV-U/RCV in both P-WL/TR-N and WL/R-N. For the tolerable accuracy of the TR-SkipGram in WL/R-N, this result might come from the high duplication ratio of social roles rooted in protagonists, as discussed in the previous section. On the other hand, we could not ﬁnd the exact reasons why the TR-CBOW could not properly operate in cases without P-WL/TR-N. Even these two points seem conﬂicting. We speculate that there might be an additional reason, as follows. While the SV-D model learns occurrences of social roles in stories, the SV-F and SV-U models are trained by dynamic changes in social roles and their adjacencies. To embrace the dynamic changes, detailed discrimination of social roles is as necessary as temporal adjacency between them. However, the RCV cannot provide the temporal adjacency information, and the conventional WL relabeling hinders distinguishing social roles by oversimpliﬁcation. When a minor character c j is adjacent to a protagonist c i and an antagonist ck . If c j has high proximity for both c i and ck , c j can be a negotiator or a spy. On the other hand, when c j has high proximity for c i and low proximity for ck , c j will be one of the colleagues of c i . We assume that this case is magniﬁed since the two problems overlapped. These experimental results veriﬁed that dynamic changes in social roles are signiﬁcant to represent stories, especially for ﬂows of stories. The temporal-radial neighborhood outperformed the conventional method only when it was used with the P-WL. Also, although the Char2Vec and SV-D exhibited tolerable accuracy in WL/R-N, the SV-F and SV-U models could not properly operate in cases without P-WL/TR-N. However, these results also indicate that TR-N was only operable with P-WL. Considering the dynamic changes overextends ranges of the neighborhoods. As we discussed, this limitation is closely connected to the relabeling methods. Similarly, we can expect that a suitable data amount provides a diversity of social roles, and disperses excessively many learning opportunities. Nevertheless, if we can adjust both the scales of neighborhoods and granularity levels of social roles, we will be able to optimize the scales and granularity levels. In further research, we will attempt to apply multi-scaled neighborhoods in learning representations of social roles and stories. This research issue is signiﬁcant regarding a variety of stories and narrative multimedia.

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

23

Fig. 13. Comparison of the accuracy for measuring the similarity of stories in the movies between the proposed embedding models and the existing methods. Three horizontal lines indicate ﬁrst quartiles, median values, and third quartiles of absolute errors, respectively. Also, tops and bottoms of whiskers refer to maxima and minima of absolute errors, respectively. Additionally, circular dots denote MAEs (as with averages of absolute errors).

4.3. Eﬃciency of the Char2Vec and Story2Vec To compare the proposed embedding models and verify RQ 2-1 and RQ 2-2, we discuss the performance of the models and differences between the models, based on (i) the accuracy of the proposed models, (ii) correlations between vector embeddings composed by each model, and (iii) the qualitative analysis results (in Sect. 3). 4.3.1. Accuracy of character vectors and story vectors To verify whether the proposed vector representations reﬂect stories in narrative works, we compare how accurately proposed models estimate the similarity of real movies with two existing methods [13,15]. The ﬁrst method [13] directly compares character networks using dimensionality reduction. The other one, [15] discovers major events from the story using dynamic changes in character networks. Then, the story similarity is estimated based on comparing temporal locations and participants of the major events. Detailed approaches of these methods are described in Sect. 5. Also, we append the Jaccard index between genres of the movies as a baseline method. When G α indicates a set of genres annotated on a movie Cα , a Jaccard index between Cα and Cβ can be measured by J (Cα , Cβ ) = |G α ∩ G β | × |G α ∪ G β |−1 . There might be various features that affect the movie similarity perceived by users. Nevertheless, these features are relevant to the story and physical expressions of the story (e.g., the genre is a concept that even covers styles of the story development) [14,34,35]. Thus, we verify relevancy between structural similarity of character networks and similarity of corresponding stories (RQ 2-1). The relevancy is evaluated in terms of how accurately the proposed models emulate the cognitive similarity of users for the narrative works. Fig. 13 presents box-and-whisker plots for absolute errors of similarity between movies that were estimated by the proposed models and the existing methods. The experimental results demonstrated that the proposed models outperformed the existing methods in terms of accuracy of the average and variance. The variance is not only visible in Fig. 13 but also ascertainable by standard deviations (SV-F: 0.15, SV-D: 0.09, SV-U: 0.10, Char2Vec: 0.11, [13]: 0.17, [15]: 0.16, and Genre: 0.18). It is diﬃcult to label this improvement in accuracy as an explicit validation of RQ 2-1. Nevertheless, at least, this signiﬁes that the proposed models can replicate human cognition for the narrative work more accurately than the existing methods for measuring story similarity. This study has given more attention to dynamic changes in character networks than to their static structures. Contrary to our expectation, the SV-D model had the highest accuracy among the proposed embedding models and the lowest computational complexity, and required the lowest number of samples, compared with the other models, for the same number of narrative works. Also, the SV-F exhibited the highest standard deviation -though it was lower than in existing models- among the proposed models. In the qualitative analysis of embedding results in Sect. 3, the SV-F, and SV-U models demonstrated a similar tendency. Also, in their MAEs, although the SV-U model outperformed the SV-F model (0.18 and 0.21, respectively), the difference between them was not signiﬁcant. However, in terms of the variance, the SV-U model demonstrated more stable accuracy than the SV-F model (0.10 and 0.15, respectively). Also, despite combining the SV-F and SV-D models, the SV-U model could not outperform the SV-D model.

24

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

Table 5 Pearson correlation coeﬃcients (PCC) and mean absolute deviations (MAD) between the movie similarities that were estimated by the proposed models (SV-F, SV-D, SV-U, and Char2Vec), genres of the movies, and the human evaluators. Model

Metric

SV-F

SV-D

SV-U

CV

Genre

Human

SV-F

PCC

–

0.85

0.92

0.68

0.55

0.80

MAD

–

0.12

0.08

0.16

0.21

0.21

PCC

0.85

–

0.87

0.76

0.63

0.88

MAD

0.12

–

0.10

0.15

0.23

0.14

SV-D

SV-U

CV

Genre

Human

PCC

0.92

0.87

–

0.77

0.64

0.89

MAD

0.08

0.10

–

0.13

0.20

0.18

PCC

0.68

0.76

0.77

–

0.65

0.88

MAD

0.16

0.15

0.13

–

0.18

0.24

PCC

0.55

0.63

0.64

0.65

–

0.73

MAD

0.21

0.23

0.20

0.18

–

0.33

PCC

0.80

0.88

0.89

0.88

0.73

–

MAD

0.21

0.14

0.18

0.24

0.33

–

These experimental results cannot underpin the necessity of analyzing dynamic changes in character networks. Nevertheless, it is common-sense that the story has temporal ﬂows. To analyze stories in narrative works more semantically, understanding the context and ﬂow of stories is one of the critical features (e.g., logical connections between events, which are described in each scene). The existing method [13] that directly compared character networks shares the same approach with the SV-D model for measuring the story similarity. Although both of them applied the static character network at the denouement, the SV-D model showed the best performance and [13] exhibited the worst performance. Thus, a simple comparison between character networks are ineffective, and the proposed methods and models are capable of extracting narrative characteristics from the character networks. Also, the other method [15] and the SV-F model have similar approaches. Based on numerous heuristic and complicated procedures, [15] ﬁrst discovers major events. Then, it describes relationships between the major events and estimates the story similarity based on the relationships. Compared to [15], the SV-F model is simpler to implement and has a more concrete research background (in the computer science area). In terms of their performance, although the SV-F model exhibited much lower MAE than [15] (0.21 and 0.32, respectively), their standard deviations were similar (0.15 and 0.16, respectively). These experimental results show the effectiveness of the dynamicity of the story. However, at the same time, we conjecture that the dynamicity is not as signiﬁcant as the static overview and protagonist in regarding human cognition for the story. Conclusively, we veriﬁed that structural similarity between character networks is effective for estimating the cognitive similarity of users for real narrative works. Also, static structures of character networks (SV-D) and social roles of protagonists (Char2Vec) demonstrated more stable accuracy than dynamic changes in the structures (SV-F). It signiﬁes that the two features are generally more useful than dynamic changes. Our future studies will focus on improving methods for representing the dynamicity of the story. Additionally, this study has not evaluated the eﬃcacy of story vectors in practical applications. This eﬃcacy can be validated by applying story similarity estimated by the proposed models on recommendation or retrieval services for narrative multimedia [15,41]. 4.3.2. Signiﬁcance of dynamic and static structures of character networks As discussed based on the qualitative analysis, each proposed model represents stories from its perspective. We conjectured that the Char2Vec, SV-F, and SV-D models reﬂect (i) the number of leading characters, (ii) dynamic changes in social relationships of the leading characters, and (iii) social relationships of the leading characters at the denouement, respectively. Also, the results of the SV-U model were almost similar to those of the SV-F model. Nevertheless, the qualitative analysis results were not enough to verify the above statements due to a lack of scale and procedural rationality. In this section, we attempt to validate partially whether the proposed models represent stories with their perspectives. The independence of perspectives is evaluated based on correlation coeﬃcients and deviations between similarity degrees measured by the proposed models. Story similarity between narrative works might be related to various narrative characteristics (e.g., deployment of major events, a composition of characters, and so on). If the results of the two models have high correlations, it implicitly shows that the two models reﬂect common narrative characteristics and vice versa. To reveal the correlations, we measured the Pearson correlation coeﬃcient (PCC) and mean absolute deviation (MAD) between similarity degrees measured based on the proposed models, genres, and evaluators, as displayed in Table 5. A larger absolute value of the PCC indicates a higher correlation between two models, and a larger MAD denotes a greater average difference between them.

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

25

Based on the experimental results, we verify RQ 2-2 and two minor assumptions from the qualitative analysis. RQ 2-2 indicates whether the dynamic changes (SV-F) and static structures (SV-D) of character networks are commonly signiﬁcant for representing stories. In the previous section, both models outperformed the existing models, and the SV-D model demonstrated the best performance. This result signiﬁes that the dynamic and static features are commonly signiﬁcant. Thus, in this section, we have to evaluate the necessity of the SV-F model, even though the SV-D model has better performance and lower computational complexity. Nevertheless, contrary to our expectation, a difference between the SV-F and SV-D models was not as distinct as differences between the Story2Vec models, Char2Vec model, and genres. The Char2Vec and genres exhibited the highest PCCs for the human evaluators. Although it was the same in the SV-D model, its PCCs for the evaluator (0.88) and the SV-F (0.85) did not present any apparent difference. The SV-F model even had higher PCC for the SV-D (0.85) than for the evaluator (0.80). These results conﬁrm that the structural similarity of character networks reﬂect distinctive perspectives of narrative works. However, to verify whether the dynamic changes in character networks are necessarily analyzed, we have to improve the methods for learning the dynamicity and extend the range of our experiments. In the qualitative analysis, we made the two simple assumptions for the SV-D and SV-U models. We expected the SV-D model to demonstrate the highest correlations with the Char2Vec and genres. However, the SV-D model exhibited the lowest PCC and highest MAD for the genres (as with all the Story2Vec models) and the second-lowest PCC and highest MAD for the Char2Vec. Moreover, the genres had the second-lowest PCC and the second-highest MAD for the SV-D. The results signify that this assumption came from a sampling bias of our qualitative analysis. Subsequently, we validate whether the results of the SV-U model are much closer to those of the SV-F model than of the SV-D model. As displayed in the ﬁrst and third rows of Table 5, the SV-F and SV-U models have the highest PCCs and lowest MADs for each other. We designed the SV-U model to train a vector representation of a story by using the SV-F and SV-D models alternately. What we overlooked was that the SV-D model uses much fewer samples than the SV-F model for a movie (on average, 224.8 and 39017.0 samples, respectively). In other words, the SV-D model got far fewer opportunities than the SV-F model. Therefore, in further studies, we will develop a method for securing suﬃcient training opportunities for the SV-D part of the SV-U model. However, different from our conjecture that the SV-U model is excessively biased to the SV-F, the SV-D model exhibited the lowest MAD for the SV-U. In terms of performance, as displayed in Fig. 13, the SV-U model outperformed the SV-F model, particularly in the stability of accuracy. Furthermore, the evaluators exhibited the highest PCC for the SV-U (0.89). This result signiﬁes that the SV-U has a similar tendency with the human cognition. Thus, although the SV-U model could not outperform the SV-D, the SV-U model is meaningful in terms of reﬂecting both dynamic and static characteristics of stories in the same model. These experimental results could not validate the necessity of applying the dynamic changes in character networks for representing stories. Nevertheless, the SV-U model had a more similar tendency with the human cognition than the SV-D. This point partially supports the effectiveness of the dynamicity of stories. Additionally, Table 5 presents that story vectors, character vectors, and genres reﬂect different features of narrative works. In further studies, we will conduct qualitative discussions for correlations between story vectors, character vectors, and content of movies, to reveal the features the proposed models reﬂect. To realize ‘computational narrative analysis,’ we ascertained that we are still at the beginning. 5. Related work There have been few studies for embedding stories of narrative works with vector representations. Thus, in this section, we present existing representation models for stories and methods for comparing stories based on the models. We roughly classify these models into two groups: (i) character network-based and (ii) event-centric models. 5.1. Story representation with the character network In Sect. 1 and 2, we presented fundamental concepts and methods to represent a story with a social network of characters that appeared in a story. We collectively call all the social network-based story representations the ‘character network.’ Since Weng et al. [8] ﬁrst proposed the concept of the character network, various methods have been proposed to (i) improve the reliability of the character network [42,9,16,10] and (ii) analyze the story by applying SNA techniques on the character network [8,11,14]. To improve the character network, some studies applied other methods to measure the proximity between characters [43, 9] by using dialogues exchanged among characters. A few studies attempted to annotate the emotional states of characters in the character network [12,15]. Others also added the temporal dynamicity on the character network [10]. Despite these various attempts, their story representation could not deviate from the format of a dynamic social network. A few of our previous studies [14,13] attempted to compare stories by directly using the character network. First, we applied the composition of characters in their communities [14]. In most of the narrative works, characters compose two main communities around the protagonist and antagonist. Thus, we compared stories based on the two signiﬁcant communities with three measurements: composition matrix, intra-compactness, and inter-adjacency. The composition matrix presents role distributions of communities, which represent scales and signiﬁcance of the communities in the story. The other two measurements reﬂect the distinctness of the communities. Based on these measurements, we attempted to explain the

26

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

similarity of stories in narrative works. Furthermore, in another study [13], we compared the structures of two character networks without any heuristic-based measurements. To compare arbitrary-sized graphs, we applied dimensionality reduction methods. Although these two studies exhibited that the structural similarity of character networks could be useful for comparing and explaining stories, the community-based method is too heuristic to build task-independent representations for the story, and the other is an overly naive approach that does not consider characteristics of the story. Elsner [44,45] attempted to represent plot structures of novels. He modeled the story with descriptive and emotional words associated with characters, social relationships between the characters, and occurrence frequencies of the characters. Based on these features, he deﬁned kernel functions to compare characters from multiple novels. Based on the kernels for characters, he ﬁnally built a kernel function to compare two novels as a summation of similarity between two characters from each novel. Although this kernel exhibited remarkable performance, it is diﬃcult to directly apply these methods to narrative works in other media and formats. In the case of movies, descriptions for characters and their emotional states are mostly portrayed by the performance of actors/actresses. Thus, applying this kernel requires another task, such as facial expression analysis. To avoid this media-dependency, we focused on analyzing the social roles of characters and the structure of society among the characters. Similarly, Grayson et al. [1,46] attempted to extract the character network from novels based on co-occurrences between characters. To detect co-occurrence, they proposed three methods that can be used only for novels: collinear co-occurrence, coplanar co-occurrence, and a combination of the two. Subsequently, they adopted SNA techniques to detect communities among the characters. However, in these studies, there were no further contributions than building co-occurrence-based character networks in the novel. Bost et al. [47,48,2] attempted to identify subplots from TV series by analyzing the dynamic character network. They proposed novel methods for composing the character network and measuring proximity between characters. In their novel character network, they more precisely estimated social relationships between characters by using both dialogues and cooccurrences of the characters. Subsequently, the authors have proposed two measurements for estimating persistence and anticipation of the relationship between characters. Based on the measurements, they presented transitions between subplots using changes in the outgoing strength of main characters and proximity among the main characters. Sack [49] suggested a distinctive approach to the plot model, which is slightly related to character network-based models. He proposed a plot structure model based on changes in the frequency of mentioning names of each character, which was named ‘narrative attention.’ He supposed that the plot structure is a combination of multiple plots. This assumption is similar to what Robert McKee [34] explained about the main plot and subplots of a movie. These plots have their own main, minor, and extra characters. These plots alternately interweave the plot structure. Therefore, if we assume multiple plots based on the narrative attention and appearance of characters, we can accumulatively simulate the narrative attention and interactions of characters and discover an optimal plot structure for narrative work. Furthermore, Sack [50] also attempted to generate stories based on the character network computationally. This is interesting, since most of the narrative generation studies have been based on event-centric models, as discussed in Sect. 5.2. To generate events from character networks, Sack applied the social balance theory (SBT) [51], which deals with the stability of social networks. The SBT set several equilibrium states of social networks, and relationships of characters usually become stable at denouements of stories. Thus, Sack switched the edges of character networks until the networks reached equilibrium states. And, each switch corresponded to an event. Since the SBT can measure the stability of character networks, it seems to be applicable in analyzing stories; e.g., measuring the rapidity of story development or detecting major events. Chaturvedi et al. [52] modeled relationships between characters in novels without the character network. They represented relationships between two characters as a sequence of sentences that they co-occurred. Then, each sentence was also represented by a feature vector based on the averaged embedding of words in the sentence. Based on this representation, which is a sequence of feature vectors, the authors attempted to discover latent states of relationships between characters by using HMM (Hidden Markov Models). Although the method for composing feature vectors is not applicable in various narrative multimedia, learning relationships with sequence processing models will help analyze the dynamicity of stories. 5.2. Story representation with relationships of events The character network-based models mostly applied measurements from the SNA area or compared character networks directly. These models are familiar to researchers in the computer science area and highly accessible to existing methods for analyzing graphs and networks. Nevertheless, they also have an obvious limitation in that it is diﬃcult to represent high-level semantics of the story, such as plot structures. Therefore, various studies have attempted to represent the story based on ﬂows of events [53]. They described relationships among the events and other narrative elements (e.g., characters, backgrounds, subplots, etc.) around the ﬂows. As a relatively simple approach, Jung et al. [4] proposed a model for representing stories of transmedia ecosystems. Simply speaking, they focused on a story told by multiple narrative works. They represented a set of narrative works in a transmedia ecosystem with a lattice graph. This model locates narrative works according to two axes: temporal and spatial backgrounds of major events described in the narrative works. Then, they annotated shared characters and events between the narrative works with arrows. Based on this model, they also proposed a set of heuristic rules for distinguishing methods used to extend stories in a transmedia ecosystem.

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

27

By applying this model into a single narrative work, our previous study [12,15] represented and compared plot structures of each narrative work. First, we annotated emotional relationships on the character network based on facial expressions of characters and emotional words in dialogues of the characters [12]. Based on changes in the emotional relationships, we have proposed a measurement tool for estimating the degree of conﬂict between characters [15]. Subsequently, we discovered major events in a story by using gradients of the degree of conﬂict. Major events were located on a lattice graph according to their temporal and spatial backgrounds. In addition, relationships between the events were described by the involvement of the characters and their communities. From this model, we compared stories with two methods: (i) temporal patterns of the degree of conﬂict and (ii) major events and involvements of communities for the major events. Although this approach exhibited reliable performance for estimating similarity between stories, it also contains a few limitations. First, extracting and measuring emotional relationships between characters is media-dependent, and in some formats and media (e.g., webcomics), it is diﬃcult to ﬁnd any existing studies for this task. In addition, discriminating temporal and spatial backgrounds of the events is an abstruse problem. In particular, when temporal orders of events are tangled to highlight a part of events or due to artistic presentation [53], to the best of our knowledge, we do not have any clear solution. Additionally, events do not have consistent granularity levels. An event consists of smaller events, and the event is a part of a bigger event. However, we cannot overlook the small events. Even if an event is insigniﬁcant, it can have causality relationships with major events. This abstruseness of events also causes problems when processing events in the real world [54,55]. To identify major events in evolving social networks, Lotker [56] brought the ‘two clock theory’ from the study of psychology. This theory deals with the relativity of time perception. For example, depending on whether we are stressed or relaxed, we will feel ﬂows of time differently. Based on this intuition, Lotker implemented two clock types (absolute and relative) with a simple method: the number of (i) dialogues and (ii) words spoken by users (characters). He supposed that these two clocks exhibited signiﬁcant differences around major events and veriﬁed this assumption in plays of ‘William Shakespeare.’ This method is meaningful in terms of not only its performance but also its simplicity and extendibility. To discover major events in movies, Adams et al. [57] applied physical features extracted from movies. Based on the major events, they attempted to extract structures of plots by focusing on segmenting the movies into multiple acts. The authors supposed that climaxes are located on peaks of aural/visual tempo [58] of movies. Then, boundaries of acts were detected from locations of the climaxes. Their methods were also able to distinguish other kinds of events like opening sequences. These events can be used to compare movie plot structures. There has been a study for describing relationships between events and applying the relationships to interpolate events described in a story. Purdy and Riedl [59] proposed a method for computationally understanding stories based on ‘plot graph [60].’ To solve the limitation of the existing story models, which require a large amount of manual annotations, they designed their plot model for learning crowd-sourced data. This model consisted of events and two kinds of edges connecting the events. First, a directed edge annotates dependency between the events from antecedent events to succeeding ones. The other one indicates mutual exclusiveness among the events. In the plot graph, each event consisted of multiple sentences that describe the event. To compose the event, the authors clustered sentences that appeared in the crowd-sourced descriptions for the story. Further, to detect the relationships between events, they applied a binomial distribution and mutual information, respectively. Based on the plot graph, their story understanding method aimed to bridge gaps between events described in a story and events that happened in a narrative world of the story. While a human can read between the lines, the authors attempted to computationally inference possible events between the described ones. They estimated similarity between the events based on the Word2Vec model. Subsequently, possible events were discovered by the dependency and mutual exclusiveness between events, and they ﬁnally chose an event using the similarity of possible events for their proceeding and succeeding events. Similarly, Chambers and Jurafsky [61,62] have proposed unsupervised learning methods for narrative event chains, narrative schemata, and their participants (characters). The term, narrative in these studies, is not directly relevant to the story in the narrative ‘artworks.’ However, since it has a much broader concept than the human-manufactured story, the methods proposed in these studies could be applied to analyzing the narrative work. For the narrative event chains, they ﬁnely deﬁned an event as each action of participants. Then, they modeled each event as an action, its subject, and its object. For learning the narrative event chains, they predicted succeeding events in the chains by maximizing the point-wise mutual information (PMI). Moreover, they have proposed a model for jointly learning the narrative event chains and roles of their participants. To learn which roles are adequate for which position (subject or object) of which action, they deﬁned semantically elaborate roles, e.g., police, criminal, and judge. For example, ‘police’ is not usual to be ‘subject’ of ‘arrest.’ A learning strategy of this model is similar to the above one. However, the authors added one more term into their objective function, which indicates a frequency of events that participants located in a particular position. It enables us to build the narrative schema by considering the roles of participants. These studies could be useful for automatically generating narratives. However, for analyzing stories in the narrative works, discovering events and their participants is one of the challenging tasks. There have been studies that annotate stories with more structured representations like XML. These studies have mostly been introduced through a series of workshops called “International Workshop on Computational Models of Narrative (CMN).” Representatively, Mani [5,6] proposed a markup language called ‘NarrativeML’ to integrate narrative elements into a single annotation scheme. NarrativeML included various features, and its annotation scheme was stably deﬁned according

28

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

to the narratological theories. While the aforementioned models covered no more than characters, their relationships, major events, and backgrounds of the events, NarrativeML also described goals of the characters and plots, which are high-level semantics. Though this model was exceedingly sophisticated, limitations of NarrativeML also came from its elaborateness. Even in the transmedia model of Jung et al. [4], we met a problem that it is diﬃcult to identify temporal backgrounds of events. As models require more high-level semantics, it is more diﬃcult to gather narrative elements for representing stories. Similarly, Lakoff and Narayanan [63] proposed an ontological schema for representing events. Their events contained inputs, outputs, preconditions, effects (e.g., triggering other events), spatio-temporal backgrounds, time duration, and so on. Additionally, various studies have proposed methods for automatically generating stories by using event-centric models. Gervás [64,65] has conducted various studies for computationally generating stories. To compose discourse (i.e., events that happened in a narrative world, which are not arranged for storytelling), he modeled a story as a list of events [64]. Here, an event is deﬁned as a tuple containing characters, time points, predicates, and so on. Also, Gervás has proposed a computational procedure for generating plot structures [65], based on Propp’s formalism [40,66]. However, this plot structure model is too focused on folktales, which are not very popular in the modern media industry. Kapadia et al. [67] have proposed methods for synthesizing narrative animations. They described events by using parameterized behavior trees [68]. These trees consist of possible options for characters and predicates according to options. To generate ﬂuent stories, they also proposed measurements for complexity and inconsistency of stories, based on the behavior trees. Martin et al. [69] have proposed two encoder-decoder RNN models. Event2Event model was designed to generate a sequence of events, and Event2Sentence model translated the event sequences to natural language sentences. They employed 4-tuple event representation that consisted of subjects, verbs (behaviors), objects, and modiﬁers. Regarding the event representation, this approach has common points with the studies of Chambers and Jurafsky [61,62]. Chourdakis and Reiss [70] also employed neural networks to generate stories. First, they applied the skip-gram to represent events based on their temporal adjacency. With the representations, they trained a deep reinforcement learning model. A policy function of the model predicts an event to be included in a story, and its value function evaluates the event based on preceding events in the story. Events in this method also mostly corresponded to a sentence, as with the above study. 5.3. Other approaches Excluding studies that focused on analyzing and representing the story, there have been various studies for comparing narrative works or characters with each other. Some such studies focus on measuring the personalities of characters. Flekova and Gurevych [71] attempted to proﬁle characters in a novel based on their personalities. They used the semantics of words associated with the characters to extract the personalities. While it is similar to some of our previous studies [12,15] and Elsner’s studies [44,45], this approach did not consider social relationships between characters. They used words in three categories: speeches, actions, and predication of characters. Then, they classiﬁed the personalities of characters as introverted and extroverted based on the associated words. Although this method had reliable accuracy, to compare characters, we require more detailed classiﬁcation. Gomes et al. [72] dealt with a more speciﬁc area in personalities of characters than the study of Flekova and Gurevych [71]. They proposed measurements for the believability of characters. Based on a literature review, they determined various features that affect the believability of characters: behavior coherence, behavior understandability, emotional expressiveness, etc. However, they did not consider automatically measuring these features on the narrative work. Similar to the proposed embedding models, a few studies have attempted to represent a narrative work with a vector. Nevertheless, these studies were not only irrelevant to stories in narrative works but also highly dependent on a single medium. Grayson et al. [73] embedded characters in a novel using both methods of Word2Vec: the skip-gram and CBOW. They used vector representations of the name of each character to represent the character with a vector. Although this approach exhibited unexpectedly high performance, it is diﬃcult to use it for non-textual narrative works. Yu and Lin [74] have proposed a model for learning a vector representation of a movie based on the Word2Vec model. Although they called their model ‘Movie2Vec,’ this model only used superﬁcial metadata of movies (titles, taglines, keywords, genres, directors, actors/actresses, and overviews). They generated vector representations of every word that appeared in the movie metadata. Then, a movie vector was made by a summation of vector representations of all the words in metadata of the movie. This model is far from representing not only stories in movies but also any substantial matter. Likewise, Danil et al. [75] proposed a similarity measurement between movies based on movie scripts. Similar to the above study, they made vector representations of words in the scripts. Then, a weighted summation of word vectors that appeared in a script of a movie was assigned as a vector representation of the movie. Although they applied various other features (e.g., lexicomorphological metrics and keywords), it is diﬃcult to say that word usages in scripts reﬂect substances of movies. According to Robert McKee [34,35], the script provides guidelines and milestones for ﬂows of plots, and directors and actors/actresses ﬁll the remaining parts. Video embedding techniques can also be used for representing visual narrative work. As shown in a questionnaire result from our previous study [14], preferences of users for narrative works are mainly affected by two features: (i) the story and (ii) how the story is physically described. We expect that video embedding techniques can represent the second feature due to their rapid development. For example, Hu et al. [76] proposed a method for representing videos with vectors based on a convolutional neural network and gated recurrent unit.

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

29

6. Conclusion In this study, we attempted to learn distributed representations of stories in narrative works. To compose mediaindependent and task-agnostic representations of stories, we proposed models to be used for embedding structural features of social networks among characters (character networks) that appeared in narrative works. First, we proposed the Char2Vec model that represents the substructures of character networks (social roles), which are rooted in each character by extending the Subgraph2Vec model. Subsequently, we proposed Story2Vec models that represent structures of character networks based on which social roles compose the character networks by modifying the Doc2Vec model. Furthermore, to embrace narrative features to these embedding models, we considered two more features of character networks. We proposed the proximity-aware WL relabeling method to distinguish the social roles of characters ﬁnely. Secondly, we extended a range of neighborhoods in the radial skip-gram and PV-DM model to consider temporal evolvement of character networks. Finally, we evaluated the proposed embedding models and methods based on human cognition for real movies. They outperformed the existing methods in terms of accuracy for measuring the similarity of the movies. Through research questions, we veriﬁed the effectiveness of structural features of character networks for embedding characters and stories. Further, the proposed models and methods revealed a few limitations. The learning processes of the SV-U model were biased to the SV-F model. In combining the SV-F and SV-D models, we must ﬁnd a method for balancing the opportunities of these two models. Additionally, protagonist-based representations using the Char2Vec model exhibited much higher performance than our expectations. This unexpected attainment was the same for the SV-D model. These results veriﬁed that protagonists and conclusions of stories are essential in human cognition for stories, and we must apply these points to our embedding models. Moreover, this study has the following directions for further research.

• Hierarchical Story2Vec: Stories have a multi-layered hierarchical structure [34]. For example, in a movie, a scene is the smallest unit that is signiﬁcant in the focus of the narrative. Scenes compose sequences, sequences compose acts, and acts compose movies. In the hierarchical structure of a story, entities on every layer contain their climaxes. Further, the lower entity that contains a climax with the highest conﬂict corresponds to a climax of its higher entity. This structure escalates the conﬂict until the climax of the last act. Thus, we anticipate that hierarchically modeling the story representation is eﬃcient in reﬂecting these structural characteristics of the story. For the hierarchical representation of the story, we can utilize the HVD proposed in [77] and Doc2Sent2Vec proposed in [78]. These two studies have proposed multi-layered and uniﬁed models to be used for learning distributed representations of words, sentences, and documents. • Diachronic Char2Vec: In this study, we proposed the Char2Vec model to represent characters in stories. However, this model represented social roles rooted in each character. To represent a character, we averaged representations of all social roles detected from the character. As with the SV-F model, if a character vector can reﬂect temporal changes in the social roles of the character, we will be able to analyze and compare characters more semantically. Declaration of competing interest We wish to conﬁrm that there are no known conﬂicts of interest associated with this publication and there has been no signiﬁcant ﬁnancial support for this work that could have inﬂuenced its outcome. Acknowledgement This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (NRF-2017R1A2B4010774). References [1] S. Grayson, K. Wade, G. Meaney, J. Rothwell, M. Mulvany, D. Greene, Discovering structure in social networks of 19th century ﬁction, in: W. Nejdl, W. Hall, P. Parigi, S. Staab (Eds.), Proceedings of the 8th ACM Conference on Web Science (WebSci 2016), ACM, Hannover, Germany, 2016, pp. 325–326. [2] X. Bost, V. Labatut, S. Gueye, G. Linarès, Extraction and analysis of dynamic conversational networks from TV series, in: M. Kaya, J. Kawash, S. Khoury, M. Day (Eds.), Social Network Based Big Data Analysis and Applications, in: Lecture Notes in Social Networks, Springer, 2018, pp. 55–84. [3] O.-J. Lee, J.J. Jung, Integrating character networks for extracting narratives from multimodal data, Inf. Process. Manag. 56 (5) (2019) 1894–1923, https:// doi.org/10.1016/j.ipm.2019.02.005. [4] J.E. Jung, O.-J. Lee, E.-S. You, M.-H. Nam, A computational model of transmedia ecosystem for story-based contents, Multimed. Tools Appl. 76 (8) (2017) 10371–10388, https://doi.org/10.1007/s11042-016-3626-5. [5] I. Mani, Computational Modeling of Narrative, Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, 2012. [6] I. Mani, Animation motion in narrativeml, in: B. Miller, A. Lieto, R. Ronfard, S.G. Ware, M.A. Finlayson (Eds.), Proceedings of the 7th Workshop on Computational Models of Narrative (CMN 2016), Kraków, Poland, in: OASICS, vol. 53, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016, pp. 3:1–3:16. [7] C.B. Callaway, J.C. Lester, Narrative prose generation, Artif. Intell. 139 (2) (2002) 213–252, https://doi.org/10.1016/S0004-3702(02)00230-8. [8] C. Weng, W. Chu, J. Wu, RoleNet: movie analysis from the perspective of social networks, IEEE Trans. Multimed. 11 (2) (2009) 256–271, https:// doi.org/10.1109/tmm.2008.2009684.

30

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

[9] S. Park, K. Oh, G. Jo, Social network analysis in a movie using character-net, Multimed. Tools Appl. 59 (2) (2012) 601–627, https://doi.org/10.1007/ s11042-011-0725-1. [10] Q.D. Tran, D. Hwang, O.-J. Lee, J.J. Jung, A novel method for extracting dynamic character network from movie, in: J.J. Jung, P. Kim (Eds.), Big Data Technologies and Applications, in: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (LNICST), vol. 194, Springer International Publishing, 2017, pp. 48–53. [11] Q.D. Tran, D. Hwang, O.-J. Lee, J.E. Jung, Exploiting character networks for movie summarization, Multimed. Tools Appl. 76 (8) (2017) 10357–10369, https://doi.org/10.1007/s11042-016-3633-6. [12] O.-J. Lee, J.J. Jung, Affective character network for understanding plots of narrative multimedia contents, in: M.T.H. Ezquerro, G.J. Nalepa, J.T.P. Mendez (Eds.), Proceedings of the Workshop on Affective Computing and Context Awareness in Ambient Intelligence (AfCAI 2016), Murcia, Spain, in: CEUR Workshop Proceedings, vol. 1794, CEUR-WS.org, 2016. [13] O.-J. Lee, N. Jo, J.J. Jung, Measuring character-based story similarity by analyzing movie scripts, in: A.M. Jorge, R. Campos, A. Jatowt, S. Nunes (Eds.), Proceedings of the 1st Workshop on Narrative Extraction From Text (Text2Story 2018) Co-Located with the 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France, in: CEUR Workshop Proceedings, vol. 2077, CEUR-WS.org, 2018, pp. 41–45. [14] O.-J. Lee, J.J. Jung, Explainable movie recommendation systems by using story-based similarity, in: A. Said, T. Komatsu (Eds.), Joint Proceedings of the ACM IUI 2018 Workshops Co-Located with the 23rd ACM Conference on Intelligent User Interfaces (ACM IUI 2018), Tokyo, Japan, in: CEUR Workshop Proceedings, vol. 2068, CEUR-WS.org, 2018. [15] O.-J. Lee, J.J. Jung, Modeling affective character network for story analytics, Future Gener. Comput. Syst. 92 (2019) 458–478, https://doi.org/10.1016/j. future.2018.01.030. [16] Q.D. Tran, J.E. Jung, Cocharnet: extracting social networks using character co-occurrence in movies, J. Univers. Comput. Sci. 21 (6) (2015) 796–815, https://doi.org/10.3217/jucs-021-06-0796. [17] J. Truby, The Anatomy of Story: 22 Steps to Becoming a Master Storyteller, Farrar, Straus and Giroux, 2008. [18] T. Miller, Explanation in artiﬁcial intelligence: insights from the social sciences, Artif. Intell. 267 (2019) 1–38, https://doi.org/10.1016/j.artint.2018.07. 007. [19] O.-J. Lee, Learning distributed representations of character networks for computational narrative analytics, Ph.D. thesis, Chung-Ang University, Seoul, Republic of Korea, Aug. 2019. [20] A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, S. Jaiswal, Graph2vec: learning distributed representations of graphs, Computing Research Repository (CoRR), arXiv:abs/1707.05005, Jul. 2017. [21] T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in: C.J.C. Burges, L. Bottou, Z. Ghahramani, K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 26: Proceedings of 27th Annual Conference on Neural Information Processing Systems (NIPS 2013), Curran Associates, Inc., Lake Tahoe, Nevada, US, 2013, pp. 3111–3119. [22] Q.V. Le, T. Mikolov, Distributed representations of sentences and documents, in: E.P. Xing, T. Jebara (Eds.), Proceedings of the 31st International Conference on Machine Learning (ICML 2014), in: JMLR Workshop and Conference Proceedings, vol. 32, JMLR.org, Beijing, China, 2014, pp. 1188–1196. [23] P. Yanardag, S.V.N. Vishwanathan, Deep graph kernels, in: L. Cao, C. Zhang, T. Joachims, G.I. Webb, D.D. Margineantu, G. Williams (Eds.), Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015), ACM, Sydney, NSW, Australia, 2015, pp. 1365–1374. [24] A. Narayanan, M. Chandramohan, L. Chen, Y. Liu, S. Saminathan, subgraph2vec: learning distributed representations of rooted sub-graphs from large graphs, Computing Research Repository (CoRR), arXiv:abs/1606.08928, Jun. 2016. [25] N. Shervashidze, P. Schweitzer, E.J. van Leeuwen, K. Mehlhorn, K.M. Borgwardt, Weisfeiler-lehman graph kernels, J. Mach. Learn. Res. 12 (2011) 2539–2561. [26] P. Goyal, E. Ferrara, Graph embedding techniques, applications, and performance: a survey, Knowl.-Based Syst. 151 (2018) 78–94, https://doi.org/10. 1016/j.knosys.2018.03.022. [27] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, Q. Mei, LINE: large-scale information network embedding, in: A. Gangemi, S. Leonardi, A. Panconesi (Eds.), Proceedings of the 24th International Conference on World Wide Web (WWW 2015), ACM, Florence, Italy, 2015, pp. 1067–1077. [28] A. Grover, J. Leskovec, node2vec: scalable feature learning for networks, in: B. Krishnapuram, M. Shah, A.J. Smola, C.C. Aggarwal, D. Shen, R. Rastogi (Eds.), Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), ACM, San Francisco, CA, USA, 2016, pp. 855–864. [29] P. Goyal, H. Hosseinmardi, E. Ferrara, A. Galstyan, Embedding networks with edge attributes, in: D. Lee, N. Sastry, I. Weber (Eds.), Proceedings of the 29th on Hypertext and Social Media (HT 2018), ACM, Baltimore, MD, USA, 2018, pp. 38–42. [30] B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: online learning of social representations, in: S.A. Macskassy, C. Perlich, J. Leskovec, W. Wang, R. Ghani (Eds.), Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2014), ACM, New York, NY, USA, 2014, pp. 701–710. [31] S. Pan, R. Hu, G. Long, J. Jiang, L. Yao, C. Zhang, Adversarially regularized graph autoencoder for graph embedding, in: J. Lang (Ed.), Proceedings of the 27th International Joint Conference on Artiﬁcial Intelligence (IJCAI 2018), Stockholm, Sweden, ijcai.org, 2018, pp. 2609–2615. [32] O.-J. Lee, J.J. Jung, Character network embedding-based plot structure discovery in narrative multimedia, in: R. Akerkar, J.J. Jung (Eds.), Proceedings of the 9th International Conference on Web Intelligence, Mining and Semantics (WIMS 2019), ACM, Seoul, Republic of Korea, 2019, pp. 15:1–15:9. [33] O.-J. Lee, J.J. Jung, Computational narrative representation and analytics, in: Proceedings of the 2nd International Conference on AI Humanities (ICAIH 2019), Seoul, Republic of Korea, 2019. [34] R. McKee, Story: Substance, Structure, Style and the Principles of Screenwriting, HarperCollins, New York, NY, USA, 1997. [35] R. McKee, Dialogue: The Art of Verbal Action for Page, Stage, and Screen, twelve ed., 2016. [36] T. Mikolov, K. Chen, G. Corrado, J. Dean, Eﬃcient estimation of word representations in vector space, arXiv, preprint: arXiv:1301.3781, Sep. 2013, http:// arxiv.org/abs/1301.3781. [37] J. Garten, K. Sagae, V. Ustun, M. Dehghani, Combining distributed vector representations for words, in: P. Blunsom, S.B. Cohen, P.S. Dhillon, P. Liang (Eds.), Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing (VS@NAACL-HLT 2015), Denver, Colorado, USA, The Association for Computational Linguistics, 2015, pp. 95–101. [38] L. van der Maaten, G. Hinton, Visualizing data using t-sne, J. Mach. Learn. Res. 9 (2008) 2579–2605. [39] D. Marks, Inside Story: The Power of the Transformational Arc, Three Mountain Press, 2006. [40] V. Propp, Morphology of the Folktale, University of Texas Press, 1968. [41] M.-Y. Yi, O.-J. Lee, J.J. Jung, MBTI-based collaborative recommendation system: a case study of webtoon contents, in: P.C. Vinh, V.S. Alagar (Eds.), Context-Aware Systems and Applications - Proceedings of the 4th International Conference on Context-Aware Systems and Applications (ICCASA 2015), in: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (LNICST), vol. 165, Springer International Publishing, Vung Tau, Vietnam, 2016, pp. 101–110. [42] D.K. Elson, N. Dames, K.R. McKeown, Extracting social networks from literary ﬁction, in: J. Hajic, S. Carberry, S. Clark (Eds.), Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), The Association for Computer Linguistics (ACL), Uppsala, Sweden, 2010, pp. 138–147.

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

31

[43] F. Moretti, Network theory, plot analysis, New Left Rev. 68 (2011) 80–102. [44] M. Elsner, Character-based kernels for novelistic plot structure, in: W. Daelemans, M. Lapata, L. Màrquez (Eds.), Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), The Association for Computer Linguistics, Avignon, France, 2012, pp. 634–644. [45] M. Elsner, Abstract representations of plot structure, in: Linguistic Issues in Language Technology (LiLT) 12 (5) (Oct. 2015). [46] S. Grayson, K. Wade, G. Meaney, D. Greene, The sense and sensibility of different sliding windows in constructing co-occurrence networks from literature, in: B. Bozic, G. Mendel-Gleason, C. Debruyne, D. O’Sullivan (Eds.), Proceedings of the 2nd IFIP WG 12.7 International Workshop on Computational History and Data-Driven Humanities (CHDDH 2016), Dublin, Ireland, in: IFIP Advances in Information and Communication Technology, vol. 482, Springer, Cham, 2016, pp. 65–77. [47] X. Bost, V. Labatut, S. Gueye, G. Linarès, Narrative smoothing: dynamic conversational network for the analysis of TV series plots, in: R. Kumar, J. Caverlee, H. Tong (Eds.), Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016), IEEE Computer Society, San Francisco, CA, US, 2016, pp. 1111–1118. [48] X. Bost, A storytelling machine?: automatic video summarization: the case of tv series, Ph.D. thesis, University of Avignon, France, Nov. 2016. [49] G.A. Sack, Simulating plot: towards a generative model of narrative structure, in: Proceedings of the 2013 Annual International Conference of the Alliance of Digital Humanities Organizations (DH 2013), Lincoln, NE, US, Alliance of Digital Humanities Organizations (ADHO), 2013, pp. 371–372. [50] G.A. Sack, Character networks for narrative generation: structural balance theory and the emergence of proto-narratives, in: M.A. Finlayson, B. Fisseni, B. Löwe, J.C. Meister (Eds.), Proceedings of the 2013 Workshop on Computational Models of Narrative (CMN 2013), in: OASICS, vol. 32, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, Hamburg, Germany, 2013, pp. 183–197. [51] D. Cartwright, F. Harary, Structural balance: a generalization of Heider’s theory, Psychol. Rev. 63 (5) (1956) 277–293, https://doi.org/10.1037/h0046049. [52] S. Chaturvedi, M. Iyyer, H. D. III, Unsupervised learning of evolving relationships between literary characters, in: Satinder P. Singh, S. Markovitch (Eds.), Proceedings of the 31st AAAI Conference on Artiﬁcial Intelligence (AAAI 2017), AAAI Press, San Francisco, California, USA, 2017, pp. 3159–3165. [53] P. Gervás, B. Lönneker, J.C. Meister, F. Peinado, Narrative models: narratology meets artiﬁcial intelligence, in: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2006), 2006, pp. 44–51. [54] O.-J. Lee, J.E. Jung, Sequence clustering-based automated rule generation for adaptive complex event processing, Future Gener. Comput. Syst. 66 (2017) 100–109, https://doi.org/10.1016/j.future.2016.02.011. [55] O.-J. Lee, Y. Kim, L.N. Hoang, J.E. Jung, Multi-scaled spatial analytics on discovering latent social events for smart urban services, J. Univers. Comput. Sci. 24 (3) (2018) 322–337, https://doi.org/10.3217/jucs-024-03-0322. [56] Z. Lotker, The tale of two clocks, in: R. Kumar, J. Caverlee, H. Tong (Eds.), Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016), IEEE Computer Society, San Francisco, CA, US, 2016, pp. 768–776. [57] B. Adams, S. Venketesh, H.H. Bui, C. Dorai, A probabilistic framework for extracting narrative act boundaries and semantics in motion pictures, Multimed. Tools Appl. 27 (2) (2005) 195–213, https://doi.org/10.1007/s11042-005-2574-2. [58] B. Adams, C. Dorai, S. Venkatesh, Formulating ﬁlm tempo, in: Media Computing, in: The Springer International Series in Video Computing (VICO), vol. 4, Springer, Boston, MA, 2002, pp. 57–84. [59] C. Purdy, M.O. Riedl, Reading between the lines: using plot graphs to draw inferences from stories, in: F. Nack, A.S. Gordon (Eds.), Proceedings of the 9th International Conference on Interactive Digital Storytelling (ICIDS 2016), in: Lecture Notes in Computer Science, vol. 10045, Springer, Los Angeles, CA, US, 2016, pp. 197–208. [60] B. Li, S. Lee-Urban, G. Johnston, M.O. Riedl, Story generation with crowdsourced plot graphs, in: M. desJardins, M.L. Littman (Eds.), Proceedings of the 27th AAAI Conference on Artiﬁcial Intelligence, AAAI Press, Bellevue, WA, US, 2013, pp. 598–604. [61] N. Chambers, D. Jurafsky, Unsupervised learning of narrative event chains, in: K.R. McKeown, J.D. Moore, S. Teufel, J. Allan, S. Furui (Eds.), Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL 2008), Columbus, Ohio, US, The Association for Computer Linguistics, 2008, pp. 789–797. [62] N. Chambers, D. Jurafsky, Unsupervised learning of narrative schemas and their participants, in: K. Su, J. Su, J. Wiebe (Eds.), Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL 2009) and the 4th International Joint Conference on Natural Language Processing (AFNLP 2009), The Association for Computer Linguistics, Singapore, 2009, pp. 602–610. [63] G. Lakoff, S. Narayanan, Toward a computational model of narrative, in: Proceedings of the 2010 AAAI Fall Symposium: Computational Models of Narrative, Vol. FS-10-04 of AAAI Technical Report, AAAI, Arlington, VA, US, 2010, pp. 21–28. [64] P. Gervás, Composing narrative discourse for stories of many characters: a case study over a chess game, in: Literary and Linguistic Computing (LLC) 29 (4) (2014) 511–531, https://doi.org/10.1093/llc/fqu040. [65] P. Gervás, Computational drafting of plot structures for Russian folk tales, Cogn. Comput. 8 (2) (2016) 187–203, https://doi.org/10.1007/s12559-0159338-8. [66] G.A. Sack, M.A. Finlayson, P. Gervás, Computational models of narrative: using artiﬁcial intelligence to operationalize Russian formalist and French structuralist theories, in: Proceedings of the 2014 Annual International Conference of the Alliance of Digital Humanities Organizations (DH 2014), Alliance of Digital Humanities Organizations (ADHO), Lausanne, Switzerland, 2014. [67] M. Kapadia, S. Poulakos, M.H. Gross, R.W. Sumner, Computational narrative, in: Proceedings of the 44th SIGGRAPH Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 2017), ACM, Los Angeles, CA, US, 2017, pp. 4:1–4:118. [68] M. Kapadia, J. Falk, F. Zünd, M. Marti, R.W. Sumner, M.H. Gross, Computer-assisted authoring of interactive narratives, in: J. Keyser, P.V. Sander, K. Subr, L. Wei (Eds.), Proceedings of the 19th Symposium on Interactive 3D Graphics and Games (i3D 2015), ACM, San Francisco, CA, US, 2015, pp. 85–92. [69] L.J. Martin, P. Ammanabrolu, X. Wang, W. Hancock, S. Singh, B. Harrison, M.O. Riedl, Event representations for automated story generation with deep neural nets, in: S.A. McIlraith, K.Q. Weinberger (Eds.), Proceedings of the Thirty-Second AAAI Conference on Artiﬁcial Intelligence, (AAAI 2018), the 30th Innovative Applications of Artiﬁcial Intelligence (IAAI 2018), and the 8th AAAI Symposium on Educational Advances in Artiﬁcial Intelligence (EAAI 2018), AAAI Press, New Orleans, Louisiana, USA, 2018, pp. 868–875. [70] E.T. Chourdakis, J. Reiss, Constructing narrative using a generative model and continuous action policies, in: Proceedings of the Workshop on Computational Creativity in Natural Language Generation (CC-NLG@INLG 2017), Santiago de Compostela, Spain, Association for Computational Linguistics (ACL), 2017, pp. 38–43. [71] L. Flekova, I. Gurevych, Personality proﬁling of ﬁctional characters using sense-level links between lexical resources, in: L. Màrquez, C. Callison-Burch, J. Su, D. Pighin, Y. Marton (Eds.), Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, The Association for Computational Linguistics, 2015, pp. 1805–1816. [72] P.F. Gomes, A. Paiva, C. Martinho, A. Jhala, Metrics for character believability in interactive narrative, in: H. Koenitz, T.I. Sezen, G. Ferri, M. Haahr, D. Sezen, G. Catak (Eds.), Proceedings of the 6th International Conference on Interactive Storytelling (ICIDS 2013), Istanbul, Turkey, in: Lecture Notes in Computer Science, vol. 8230, Springer, 2013, pp. 223–228. [73] S. Grayson, M. Mulvany, K. Wade, G. Meaney, D. Greene, Novel2vec: characterising 19th century ﬁction via word embeddings, in: D. Greene, B.M. Namee, R.J. Ross (Eds.), Proceedings of the 24th Irish Conference on Artiﬁcial Intelligence and Cognitive Science (AICS 2016), Dublin, Ireland, in: CEUR Workshop Proceedings, vol. 1751, CEUR-WS.org, 2016, pp. 68–79.

32

O.-J. Lee, J.J. Jung / Artiﬁcial Intelligence 281 (2020) 103235

[74] P. Yu, L. Lin Sebpr, Semantics enhanced Bayesian personalized ranking with comparable item pairs, in: C. Domeniconi, F. Gullo, F. Bonchi, J. DomingoFerrer, R.A. Baeza-Yates, Z. Zhou, X. Wu (Eds.), Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW 2016), Barcelona, Spain, IEEE Computer Society, 2016, pp. 1015–1022. [75] B. Danil, Y. Elena, P. Ekaterina, Similarity measures and models for movie series recommender system, in: S.S. Bodrunova (Ed.), Proceedings of the 5th International Conference on Internet Science (INSCI 2018), St. Petersburg, Russia, in: Lecture Notes in Computer Science, vol. 11193, Springer, 2018, pp. 181–193. [76] S. Hu, Y. Li, B. Li, Video2vec: learning semantic spatio-temporal embeddings for video representation, in: Proceedings of the 23rd International Conference on Pattern Recognition (ICPR 2016), Cancún, Mexico, IEEE, 2016, pp. 811–816. [77] N. Djuric, H. Wu, V. Radosavljevic, M. Grbovic, N. Bhamidipati, Hierarchical neural language models for joint representation of streaming documents and their content, in: A. Gangemi, S. Leonardi, A. Panconesi (Eds.), Proceedings of the 24th International Conference on World Wide Web (WWW 2015), Florence, Italy, ACM, 2015, pp. 248–255. [78] G. J, M. Gupta, V. Varma, Doc2sent2vec: a novel two-phase approach for learning document representation, in: R. Perego, F. Sebastiani, J.A. Aslam, I. Ruthven, J. Zobel (Eds.), Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2016), Pisa, Italy, ACM, 2016, pp. 809–812.

Story embedding: Learning distributed representations of stories based on character networks

Story embedding: Learning distributed representations of stories based on character networks

Recommend Documents