Journal of Computer Languages 51 (2019) 241–260
Contents lists available at ScienceDirect
Journal of Computer Languages journal homepage: www.editorialmanager.com/cola/default.aspx
Research on context of implicit context-sensitive graph grammars✰ Zou Yang a b
a,b,⁎
a
, Lü Jian , Tao Xianping
a
T
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China Computer and Information College, Hohai University, Nanjing 210098, China
ARTICLE INFO
ABSTRACT
Keywords: Visual languages Implicit context-sensitive graph grammars Contexts Graph grammar comprehension Parsing algorithms
Visual Programming Languages have been widely adopted in design and comprehension of sophisticated systems. Context-sensitive graph grammar formalisms are suitable tools for specifying these languages, since they are intuitive and possess sufficient expressive power and usability. Nevertheless, some of the formalisms whose contexts are implicitly or incompletely represented in productions, called implicit context-sensitive graph grammars, suffer inherent weakness in intuitiveness and limitations in parsing algorithms. To address these issues, this paper formally presents a notion of context on the underlying concepts of partial and total precedence relations, characterizes their fundamental properties, and establishes a connection between contexts and their instances (also called context graphs elsewhere), based on the Reserved Graph Grammar formalism, a representative of implicit graph grammars. Moreover, three typical applications of contexts are illustrated, which show that contexts can both facilitate the comprehension and design of implicit graph grammars so as to enhance their intuitiveness, and make the existent efficient parsing algorithms more widely applicable.
1. Introduction Visual Programming Languages (VPLs), which usually handle objects that do not possess inherent visual representation [1], have been widely adopted in software engineering and many other fields of computer science. Like string languages that are frequently equipped with a formal syntax definition and a parser, VPLs also need the support of such mechanisms. In this sense, graph grammars are deemed as a well-established theoretical foundation for VPLs [2]. As a natural extension of formal grammar theory that provides mechanisms for the specification and parsing of string languages, graph grammars offer similar capabilities for VPLs. It is noticeable that of a variety of graphical VPLs that are frequently used in the design and comprehension of sophisticated systems, only a few are equipped with proper formal syntax definition. This is largely due to two facts. One fact is, the extension from one-dimensional stringbased formal grammars to two-dimensional graph grammars brings about a few novel challenges such as embedding problem and membership problem. In a derivation or deduction step of a graph grammar, a host graph will be transformed by deleting a subgraph from it and then embedding another graph into it. The embedding problem is how to avoid creating dangling edges in the resulting host graph. The other
fact is, existing graph grammar formalisms still need to be ameliorated in some aspects including expressive power, usability and computing efficiency, etc., even though acknowledged to be instrumental in practical applications, such as software architecture [3] and its evolution [4], behavior models synthesizing [5], safety analysis for system architecture [6], generation of intelligent diagram editors [7], pattern recognition [8–10], and many others [11,12]. Several graph grammar formalisms and their parsing algorithms have been proposed in the literature, most of which are context-free and context-sensitive [13]. The expressive power of a graph grammar lies on the type it belongs to as well as the embedding mechanism it chooses [14]. Among all the categories of embedding mechanisms that vary in complexity and power, invariant embedding is the least complex one and most commonly employed in graph grammar formalisms. Context-sensitive graph grammars tend to be more expressive than context-free ones, when confined to identical less complex embedding mechanisms and invariant embedding in particular. As context-free graph grammars have difficulty in specifying a large portion of graphical VPLs [15,16], recent researches in this subfield focus more on context-sensitive graph grammars. A graph grammar consists of a set of productions, each of which is a pair of graphs, called left graph and right graph respectively, together
This article was originally submitted to the Journal of Visual Languages and Computing, however during the process of review the journal underwent a name change and is now titled Journal of Computer Languages. ⁎ Corresponding author at: State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China. E-mail address:
[email protected] (Y. Zou). ✰
https://doi.org/10.1016/j.cola.2019.01.002 Received 30 March 2016; Received in revised form 30 November 2018; Accepted 4 January 2019 1045-926X/ © 2019 Elsevier Ltd. All rights reserved.
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
with an embedding expression. In context-sensitive graph grammars, the contexts pertaining to a production generally refer to the neighboring subgraphs of the rewritten portion of its left graph in potential host graphs [2], which describe the situations under which the production can be applied. However, the context portion of a production, i.e., the remainder of the left graph minus the rewritten portion, is commonly not a direct copy of the contexts for the sake of conciseness of productions and easiness of embedding. Here we might as well give a brief comparison of context-sensitive graph grammar formalisms from the perspective of context definition, which elicits the motivation of our work in this paper. The most representative context-sensitive graph grammar formalisms are Layered Graph Grammar (LGG) [15] and Reserved Graph Grammar (RGG) [16]. To solve the embedding problem, LGG identically involves in the left and right graphs of a production its immediate context and imposing a dangling edge condition on redex definition, which guarantees that dangling edges never occur in rewritten host graphs. It also enforces a lexicographical order on both graphs of productions by decomposing label alphabets of their elements into a number of layers, so as to ensure the decidability of the membership problem. LGG is equipped with a complex parsing algorithm that attempts all the possible sequences of production applications. RGG is viewed as an improvement over LGG in respect of succinctness of specification and efficiency of parsing algorithm. Rather than directly involving contexts in productions just as the LGG does, the RGG formalism invents a particular two-level node structure coupled with a marking technique to indirectly specify the context of a production by identically distributing a set of marked vertices into the left and right graphs. The vertices establish a one-to-one correspondence between the two graphs in terms of their marks. Thus, the embedding problem is solved through this mechanism together with a dedicated embedding rule. The RGG inherits LGG's solution to the membership problem, and provides a naive Selection-Free Parsing Algorithm (SFPA) with polynomial time complexity on condition that the graph grammars are locally confluent. The condition selection-free is also called locally confluent in the literature. Other context-sensitive formalisms include Edge-based Graph Grammar (EGG) [17], Context-Attributed Graph grammar (CAGG) [18], Contextual Layered Graph Grammar (CLGG) [19], Spatial Graph Grammars (SGG) [20,21], and Breeze Graph Grammar (BGG) [22]. To tackle the embedding problem, EGG identically augment a set of marked dangling edges to both the left and right graphs of a production, whereas CAGG introduces attributes of nodes to establish a correspondence between the two graphs of a production. CLGG and SGG are extensions of LGG and RGG, respectively. Based on LGG, CLGG supports three extra mechanisms, which can be employed to define more complex VPLs. SGG extends RGG by augmenting its productions with a spatial specification mechanism, with which it can explicitly describe both structural and spatial relationships for VPLs. With the support of spatial specification, SGG achieves a more efficient parsing performance over the underlying parser SFPA. With reference to the framework of LGG and to the node structure of RGG, BGG is presented as an application-oriented formalism dedicated to architecture specification, and system reliability modeling and evaluation. According to how the context portion of a production is dealt with, the above formalisms can be roughly classified into two categories: explicit and implicit. The former are those that directly enclose the complete immediate contexts as its context portion in a production, whereas the latter refers to the ones in which the context portion is expressed as specifically tailored (i.e., incomplete) immediate contexts, and newly introduced notations or attributes adhered to the rewritten portion. Readily, LGG and RGG are the typical examples of the former and the latter, respectively. In this paper, we focus our attention to the latter. Visual representation is one of the inherent capabilities of graph grammars. The implicit formalisms, equipped with naive but efficient parsing algorithms, have hitherto found a wide range of applications,
such as model management [23], multimedia layout adaptation [24,25], graphical user interface design and adaption [21], design pattern evolution [26], program behavior discovery and verification [27], web interface Adaptation [28], etc. Despite these effective applications in many graph-related fields, the implicit formalisms suffer two fundamental drawbacks. An inherent weakness of implicit formalisms is that they are not intuitive, which arises from the fact that the context portion of a production is not the complete immediate contexts. In RGG, the context portion is a set of marked vertices. Vertices are explained to be connecting points of edges, but their exact meaning is left undefined. Therefore, the selection, arrangement and marking of vertices within a node become a challenge in the design of productions. Similar situations arise in other implicit formalisms. Moreover, as actual immediate contexts are absent in productions, it is rather difficult for users to exactly comprehend the language of a given graph grammar. Thus, an approach to facilitating graph grammar comprehension is requisite for implicit formalisms. The SFPA that underlies the parsing algorithms of the implicit formalisms is theoretically confined to locally confluent graph grammars. This restriction considerably impedes the practical application of these formalisms for a couple of reasons. On the one hand, the condition of local confluence is not frequently met in practical applications; as an example, the language of process flow diagrams may be intuitively specified as a non-confluent graph grammar (as depicted in the next section). On the other hand, the requirement of creating locally confluent set of productions inevitably places extra burden on the graph grammar designers, in particular, a pair of productions with the same right graph, which is a frequently-used design pattern of productions, lead to a non-confluent graph grammar. Therefore, a widely applicable efficient parsing algorithm is still essential. Context investigation is a conceivable way to address the above two issues. This conception stems from an observation that contexts are merely employed as a means to tackling the embedding problem in context-sensitive graph grammars while the information they carry are unconsciously ignored, and the discovery of the contexts that are implicitly represented in a set of productions can facilitate the comprehension of the graph grammar and supplementary context matching can benefit its parsing. In this paper, on the basis of RGG that exemplifies the implicit graph grammar formalisms, we develop a theory of contexts, and then illustrate three applications of them. The technical contributions of the paper are twofold. First, it presents a formal definition of context, and characterizes its fundamental properties theoretically. The notion of context is defined on the basis of the partial and total precedence relations. The properties of contexts unveil the relationship between the existence of contexts and total precedence relations, and establish a connection between contexts and their instances. Second, contexts can be capable of facilitating comprehension of implicit graph grammars and performance improvement of graph parsing, two fundamental issues in graph grammars. Three typical applications of contexts in this regard are demonstrated. In particular, a feasible approach is elaborated to the construction of locally confluent set of productions with the aid of contexts. The remainder of the paper is organized as follows: Section 2 reviews the Reserved Graph Grammar formalism, a representative of implicit graph grammars. Section 3 introduces the notions of partial and total precedence relations, upon which Section 4 formally defines the notion of contexts and context instances, and characterizes their properties. Three typical applications of contexts are illustrated in Section 5. Section 6 reviews related work, and Section 7 concludes the paper and proposes future research. 2. The reserved graph grammar formalism A graph grammar consists of an initial graph and a collection of productions (graph rewriting rules). Each production has two graphs 242
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
of p7 and that of p9. In the process of deduction, both p9 and p7 can be applied to the host graph, leading to two different deductions: one is aborted, and the other is successful, as shown in Fig. 2.3(b) and (c), respectively. If p9 is applied, the left graph of Fig. 2.3(b) is obtained, where the subgraph surrounded by the dashed green rectangle is the output of the application of p9 to the redex and another subgraph surrounded by the dashed red rectangle is a redex of p4; and further application of p4 results in the right graph of Fig. 2.3(b), to which no more applications of productions can be conducted. Nevertheless, if p7 is applied, a sequence of applications of productions p2, p4, and p5 can then follow, resulting in the initial graph, i.e., the right graph of p1, as shown in Fig. 2.3(c). Therefore, the condition of local confluence is not met by the graph grammar in Fig. 2.2. As a matter of fact, whether a graph grammar is locally confluent or not has nothing to do with host graphs. Instead, it depends completely on the grammar itself. An observation is likely to be made that a graph grammar that contains two productions whose right graphs excluding marked integers in vertices are isomorphic to each other is probably not locally confluent. The reason is that if there is a redex of both right graphs of two productions in a host graph, their respective R-applications to it must generate two distinct new host graphs, as their left graphs should be different, otherwise they become identical.
Fig. 2.1. Node structure of the RGG formalism.
called left graph and right graph respectively, and can be applied to another graph called host graph. Every node in a production is either a terminal or a non-terminal node. A graph grammar defines a graph language composed of those graphs that can be derived from the initial graph by repeated applications of the productions and whose nodes are all terminal ones. A redex is a subgraph in a host graph that is isomorphic to the left or right graph of a production. A production's Lapplication to a host graph is to find in the host graph a redex of its left graph and replace the redex with its right graph, and an R-application is a reverse replacement. L-applications can be used to generate the language of a graph grammar, whereas R-applications are applied to parse graphs according to a graph grammar. The RGG is a context-sensitive graph grammar formalism [16]. It introduces a node-edge format to represent graphs in which each node is organized as a two-level structure, as illustrated in Fig. 2.1, where the large surrounding rectangle is the first level, called super vertex, and other embedded small rectangles are the second level, called vertices. Either a vertex or a super vertex can be the connecting point of an edge. In addition to the two-level node structure, the RGG also introduces a marking technique that divides vertices into two categories: marked and unmarked ones. Each marked vertex of a production is identified by an integer that is unique in the left or right graph where the vertex lies. A production is properly marked if each marked vertex in the left graph has a counterpart marked by the same integer in the right graph, and vice versa. In the process of a production application, when a redex is matched in a host graph, each vertex that corresponds to a marked vertex in the left or right graph preserves its associated edges connected to nodes outside of the redex, which avoids the appearance of dangling edges during the subsequent subgraph replacement provided that an additional embedding rule is also enforced. The embedding rule states that if a vetex in the right (or left) graph of a production is unmarked and has an isomorphic vertex in the redex of a host graph, then all the edges connected to the vertex should be completely inside the redex. Apparently, the embedding rule and the marking technique dedicated to the two-level node structure properly handles the unmarked and marked vertices respectively in the process of production applications, which solves the embedding problem. The RGG is equipped with a naive parsing algorithm called SFPA, which is confined to such graph grammars that satisfy the selection-free condition. This condition, also known as local confluence elsewhere [2], ensures that different orders of applications of productions to a host graph lead to a same outcome. The SFPA is efficient with polynomial time complexity, as it only tries one path in the parsing process. As an example, an RGG specifying process flow diagrams, which is slightly adapted from [15], is depicted in Fig. 2.2. For simplicity, we use p1- p9 to refer to its productions. Obviously, this grammar does not satisfy the condition of local confluence. To explain why the argument holds, a rather simple related host graph and two deductions (parsing) are given in Fig. 2.3(a) and (b) and (c), respectively. In each graph of Fig. 2.3, the subgraph surrounded by the dashed red rectangle (if exists) is a redex of some production's right graph, and the subgraph surrounded by the dashed green rectangle (if exists) is the output of the application of a production to the redex in the anterior graph in a deduction. It is noted in Fig. 2.3(a) that the subgraph surrounded by the dashed red rectangle in the host graph is a redex of both the right graph
3. Partial and total precedence In this section, we take the RGG as the representative of implicit context-sensitive graph grammar formalisms to present partial and total precedence relations between graph productions. Some fundamental concepts and notations regarding graph grammars in this paper are similar to that in the RGG [16]. We list them below, for the sake of expression clarity and simplicity. Note that graphs are in the node-edge format and only vertices in productions might be marked. Ω: A finite set of node labels, which can be divided into two disjoint sets, called terminal label set ΩT and nonterminal label set ΩNT. p : =(L , R) : A production with a pair of marked graphs: the left graph L and right graph R over the same label set Ω, endowed with a bijective mapping between their marked vertices. The notations p.L and p.R represent the left and right graphs of a production p, respectively. For any graph G, G.N and G.E denote the set of nodes and edges, respectively; n.V and n.v denote the set of vertices and some vertex v of a node n, respectively; and G. V = n G . N n. V is the union of the sets of vertices of nodes in G; for any edge e, s(e) and t (e) represent the source and target vertex of e, respectively, and l(e) is the label on e. G : Given a graph G, G denotes the unmarked version of G, i.e., the graph that results from deleting the marked integers in vertices from G. Readily, G = G if G is unmarked. If a marked graph G is used as a host graph in context, then it is treated as the unmarked version G , which will not be specifically indicated below. G1 ≈ G2: G1 is isomorphic to G2, i.e., G1 is isomorphic to G2 , to be exact. Redex: A subgraph X⊆H is a redex of graph G, denoted as X ∈ Rd(H, G), if X ≈ G under an isomorphic mapping f and any vertex in X that is isomorphic to an unmarked vertex in G keeps its edges completely inside X. Rd(H, G): A set of redexes of marked graph G, which are subgraphs of graph H. Merger: A graph G is a merger of G1 and G2, if both G1 and G2 are subgraphs of G and each node or edge in G is either from G1 or from G2. Mrg(G1, G2): The set of mergers of G1 and G2. H → pH′: L-application of a production p : =(L , R) to a host graph H, yields H′. (A derivation step) H↦pH′: R-application of a production p : =(L , R) to a host graph H, 243
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
Fig. 2.2. A graph grammar for process flow diagrams.
yields H′. (A deduction step) H → *Hn: A series of L-applications to a host graph H: H L1 H1, H1 L2 H2, …, Hn 1 Ln Hn , yielding Hn, where n ≥ 0, and Hn = H when n = 0 . (A derivation) H↦*Hn: A series of R-applications to a host graph H: H R1 H1, H1 R2 H2, …, Hn 1 Rn Hn , yielding Hn, where n ≥ 0, and Hn = H when n = 0 . (A deduction)
Fig. 2.2 illustrates an example of an RGG, where the initial graph A is treated as a special case of a production with empty left graph λ, i.e., production p1; P is the set of the rest productions; and Ω is the set of all the labels occurring in P. 3.1. Partial precedence relation We proceed to present a kind of partial precedence relation on the set of the productions of a graph grammar. Generally, the left and right graphs of productions are composed of one or more separate components. Definition 3.2. Given a directed graph G = (N , E ) , a subgraph S = (N , E ) G is a connected component if for any two nodes n, m ∈ N′, the following two conditions hold:
Prior to introduction of new concepts, we review the formal definitions of the RGG formalism. Originally, the RGG [16] inherits from the LGG its solution to the decidability problem, which requires that the label alphabet be decomposed into disjoint layers and the left graph of each production be lexicographically smaller than its right graph in terms of the layers of their labels. Subsequently, an enhanced version of the RGG was proposed in [29], which presents a rather simpler and more intuitive condition that merely imposes a weak constraint on the size between the left and right graphs of productions. Here we adopt a slightly revised version of the latter for making it more general. Definition 3.1. A reserved graph grammar is a triple (A, P, Ω), where A is an initial graph, P a set of graph grammar productions, Ω a finite label set consisting of two disjoint sets ΩT and ΩNT. For each production p : =(L , R) ∈ P, the following conditions are satisfied:
• R is non-empty; • L and R are both over Ω; • the size of R are |L . N |
|R . N |
not less than that (|L . N | = |R. N | (|L . N T | |R. .
N T | (|L . N T | = |R.N T |
of
L,
• If (n, m) ∈ E′, then (n, m) ∈ E; • there is a sequence of nodes n = n , n , …, n (ni 1, ni)
E
(ni , ni 1)
0 1 k = m such that E , ni ∈ N′, and 1 ≤ i ≤ k.
A subgraph S is a maximal connected component of G if there exists no N . The funcconnected component S = (N , E ) G such that N tion Mcc is a mapping from graphs to their sets of maximal connected components. The definition covers both marked and unmarked directed graphs. Whether a graph is marked or not will not be specifically mentioned, when it is suggested from context. Hereafter, maximal connected component is commonly abbreviated to component when it is clear from context. In addition, some notations are introduced to clarify the expressions in forthcoming definitions. For a set P of productions, PL = {p . L|p P } and PR = {p . R|p P } ; Mcc(PL) and Mcc(PR) denote the unions of the sets of components of all the members of PL and PR, respectively, i.e., Mcc (PL) = {C|C Mcc (p. L) p P } ,
i.e.,
|L . E| < |R. E|)))
The third condition states that the number of nodes in R is not less than that in L, and if they are equal, the number of terminal nodes in R must be no less than in L and the number of edges in R must be more than in L when they are equal. 244
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
Fig. 2.3. A host graph and two deductions. (a) A host graph. (b) A aborted deduction. (c) A successful deduction.
• ∀C ∈ S (∃X⊆f(C)(X ∈ Rd(f(C), C))), and •
Mcc (PR ) = {C|C Mcc (p . R) p P } . Definition 3.3. Let gg : =(A, P , ) be an RGG, p1, p2 ∈ P be two productions, C1 ∈ Mcc(p1.L) and C2 ∈ Mcc(p2.R). If ∃X⊆C2 such that X ∈ Rd(C2, C1), then C1 is matched with X in C2, denoted as C1 ≈ X⊆C2; or concisely C1 is included in C2, denoted by C1⊑C2.
1
C, C
S1 (C
C
f ( C ) = f (C )
Rd (f (C ), C )
X
X, X
.
f (C )(X
Rd (f (C ), C )
X
X = ))
The first condition states that for each component in S1, there is a image in S2 under the mapping f that contains a redex of it; and the second expresses that if two different components in S1 have the same image in S2, then the two corresponding redexes in it cannot overlap, which adheres to the redex definition in the RGG formalism. The partial precedence relation is established based on the above definitions. In the following, angled parentheses will be used to specify ordered pairs with relations. Definition 3.6. Let gg : =(A, P , ) be an RGG, and p1, p2 ∈ P be two productions, p1 directly partially precedes p2, denoted as p1≼dp2, if ∃S⊆Mcc(p2.L) such that S⊑Mcc(p1.R). The direct partial precedence relation between them is denoted by the pair 〈p1, p2〉.
This definition introduces the notion of inclusion between the components of productions, or to be exact, to locate a redex of a component of the left graph of one production in some component of the right graph of another production. The notion is generalized below to adapt to the situations of sets of components of productions. Definition 3.4. Let U be some set and S = (B, m) a multiset, where B is the underlying set of elements and m : B is a mapping from B to U if and only if B⊆U. the set of positive natural numbers. S In order to unambiguously reference to an element from a multiset, we stipulate that each element in a multiset is uniquely identified. That is, any two elements in a multiset S have distinct identities even if they are the same element from the point of view of the underlying set B. However, the identities of elements are not explicitly represented in context for the sake of conciseness. Definition 3.5. Let gg : =(A, P , ) be an RGG, and PL and PR the sets of left and right graphs of productions in P, respectively. A set S1⊆Mcc(PL) Mcc (PR ) , denoted as S1⊑S2, if there is is included in another multiset S2 a mapping f: S1 → S2 such that:
Definition 3.7. Let gg : =(A, P , ) be an RGG, the direct partial precedence relation on the set P of productions is defined as P p1 d p2 } . P = { p1 , p2 |p1 , p2 Definition 3.8. Let gg : =(A, P , ) be an RGG, and p1, p2 ∈ P be two productions, p1 partially precedes p2, denoted as p1≼p2, if one of the following two conditions holds:
245
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
1:T 5 send 2:B
1:T send 2:B
1:T send 2:B
3:T 5 receive 4:B R9
3:T receive 4:B R7
3:T receive 4:B R8
1:T Stat 2:B
1:T Stat 2:B
1:T 5 send 2:B
3:T 5 receive 4:B L9
3:T Stat 4:B L7
3:T Stat 4:B L8
1:T fork 2:B
1:T if B
T Stat B T endif 2:B R2
1:T assign 2:B R3
T Stat B
1:T fork B
1:T Stat B
T Stat B
T Stat B
T Stat 2:B R4
T Stat B 3:T join 4:B R6
T Stat B T join 2:B R5
1:T fork 2:B 1:T Stat 2:B L2/L3/L4/L5
begin B T Stat B
T end
T Stat B 3:T join 4:B L6
R1 Fig. 3.1. The direct partial precedence relations on the production set of an RGG.
•p≼p; • ∃p ∈ P(p ≼p∧p≼ p ). 1
in P, where each arrow connecting a right graph to a left graph or one of its components. Moreover, we also acquire from this set the direct partial precedence relations on P, when taking the arrows pointed to the components of a left graph as to the graph itself and viewing a pair of left and right graphs connected by a red solid arrow as a whole (a production). Specifically, if there is a dashed green arrow directed to p2 from p1, then p1≼dp2 and p1≼p2 too. For example, the set of direct partial precedence relations regarding p9 is as follows: {〈p9, p9〉, 〈p7, p9〉, 〈p8, p9〉, 〈p2, p9〉, 〈p4, p9〉, 〈p5, p9〉, 〈p6, p9〉, 〈p1, p9〉, 〈p9, p8〉}, which coincides with the set of partial precedence relations.
d 2
1
d 2
The partial precedence relation between p1 and p2; is denoted by the pair 〈p1, p2〉. The partial precedence relation is the closure of the direct partial precedence relation on a set P. The last three definitions characterize the partial precedence relation on the set of productions of a graph grammar. The intuition behind the relation is: if p1 partially precedes p2 in derivation, then p2 probably takes precedence over p1 in some specific situations during the course of parsing. Example 3.1. Direct partial precedence relation.
3.2. Total precedence relation Partial precedence is a kind of relation between a pair of components chosen from two distinct productions, whereas total precedence describes the same relation between two sets of components from a subset of productions and a single production respectively. Definition 3.9. Let gg : =(A, P , ) be an RGG, p ∈ P, and a multiset P P . P′ directly totally precedes p, denoted as P′≺dp, if there is a surjective mapping f: Mcc(p.L) → St such that:
The direct partial precedence relation on the production set P of the RGG in Fig. 2.2 is illustrated in Fig. 3.1. The left and right graphs of the productions are surrounded by dashed gray rectangles and ellipses, respectively. Each solid red arrow connects the left graph of a production to its right graph. If a left graph is composed of several components, then they are enclosed in respective small dashed gray rectangles, all of which are in turn surrounded by an external rectangle, such as L9, i.e., the left graph of production p9. The set of dashed green arrows is the inclusion relations on the components of the productions
• St 246
Mcc (PR) ;
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
• Mcc (p. L) Mcc (P ) with respect to f; • ∀p′ ∈ P′(∃C ∈ Mcc(p. L)(f(C) ∈ Mcc(p′. R))).
Precedence structures are formalized in the above definition, together with two preceding ones that underlies it. As the concept seems rather complicated, a prominent characterization of it is further elaborated as follows. Theorem 3.1. Let gg : =(A, P , ) be an RGG. A precedence structure ps: =( E , R) on P forms a rooted tree.
R
The corresponding direct total precedence relation is denoted by the pair 〈P′, p〉, and p and P′ are called the target production and preceding set, respectively. A direct total precedence relation specifies that a certain graph composed of the right graphs of a subset of productions contains a redex of the left graph of another production. Note that the third constraint on f emphasizes that every production in P′ takes part in f with at least one of its components in the right graph. Direct total precedence relations are rather complicated, as opposed to partial precedence relations. If a subset P′ of P forms such a relation with a production p, then it means that all the right graphs of P′ must exactly comprise a redex of the left graph of p with each one containing at least one redex of its components. For example, {〈{p9, p2}, p9〉, 〈{p7, p2}, p9〉, 〈{p9, p4}, p9〉} is a subset of the direct total precedence relations regarding p9. Definition 3.10. Let gg : =(A, P , ) be an RGG, the direct total precedence relations on the set P of productions is defined as P p P P d p} . P = { P , p |P
E , i.e., a direct Proof. It is obvious that each element e = P , p total precedence relation, corresponds to a rooted tree with p as the root node and the elements of P′ as its children (leaf nodes). Such a tree is simply formed by connecting the root to each of the leaves with directed edges respectively. A rooted tree corresponding to ps can be inductively generated as follows: Base case. According to the above method, E1 = {et } E and R 0 = {r R|et r . D} = forms a rooted tree of height 1, called Tr1. By Ru, we denote the set of relations in R that has not been used yet. Currently, Ru=R R 0 = R . Inductive step. The inductive hypothesis states that a tree Trk of height k has been established such that there is a one-to-one correspondence between the elements of Ek = {e|Nl (et , e ) = k 1} and the outmost subtrees of height 1 whose roots are at level k 1, and the relations in Ru=R 0 i < k Ri remain unused, where k ≥ 1. (The height of a rooted tree is the maximum of the levels of nodes). If Ru = , the conclusion readily holds; otherwise Rk = {r R|Nl (et , r . e ) = k 1} . We prove this assertion and then employ Rk and Ek + 1 to construct a new tree called Trk + 1. According to the definition of Ru, there must exist a relation r ∈ Ru k , as Ru ≠ ∅. Then, there is a relation, say such that Nl (et , r . e) = k 1 = k, 1, from Definition 3.13. If k r1 ∈ Ru with Nl (et , r1. e ) = k then Rk ≠ ∅; otherwise, we proceed in this way, and finally reach a n = k 1. Hence, Rk ≠ ∅ relation rn ∈ Ru with Nl (et , rn. e ) = k holds as claimed. Then, we select a relation ra ∈ Rk, and utilize it to produce a new tree on the basis of Trk. According to the inductive hypothesis, there Ek with exists a subtree tra of Trk that corresponds to ra. e = Pa, pa the root pa located at level k 1 and the leaves Pa at the outmost level k. Consider any e = Pb, pb ra. D , we have pb ∈ Pa from Definitions 3.11 and 3.12; then we append the tree to which 〈Pb, pb〉 corresponds to Trk by gluing the root pb to the leaf in the subtree tra that Ek + 1, by the definition of Ek. shares the same label as pb. Note that e The utilization of the relation ra is accomplished only when all the elements in ra. D have been dealt with in this way. Repeating the above procedure until all the relations in Rk are completely processed, we obtain the tree Trk + 1. Note also that each element in Ek + 1 is used once and only once in the whole process, which is guaranteed by the last two conditions of Definition 3.14. Clearly, Trk + 1 is of height k + 1 with all the newly added subtrees rooted at level k. □
Definition 3.11. Let P be a set of productions, ≺P the direct total P , and D precedence relations on P, P P . A mapping f: P′ → D∪{null} is called a linking mapping, if the following two conditions hold:
• there exists a nonempty subset P •
P such that f can be split into 1 two sub-mappings in terms of the two subdomains P1 and P P1 : f : P1 D , which is bijective; and f : P P1 {null} , which exists only when P P1 ; if P1 = P , f is called complete; p P ( f (p ) = P , p p = p ).
Definition 3.12. Let gg : =(A, P , ) be an RGG, and ≺P the direct total precedence relations on P. A triple r = (e, D, f ) is a linking relation, or equivalently, D is linked to e with respect to f, if e = P , p P, D P and f: P′ → D∪{null} is a linking mapping. e and the elements in D are called the head and tails of r, respectively. Customarily, r.e, r.D and r.f are used to denote the head, the set of tails, and the mapping of a relation r, respectively, when the triple is not explicitly presented. Definition 3.13. Let ≺P be the direct total precedence relations on a set P of productions, R a set of linking relations on ≺P, and e, e′ ∈ ≺P. e′ is linked to e with respect to R, denoted as LR(e, e′), if one of the following two conditions holds:
• there is a linking relation r = (e, •
D, f ) such that e′ ∈ D, and in this case, the number of links between e and e′, denoted by Nl(e, e′), equals to 1; there is an element e″ ∈ ≺P and a linking relation r = (e , D , f ) R such that e′ ∈ D′ and e″ is linked to e with respect to R\{r′}, and in this case, Nl (e , e ) = Nl (e, e ) + 1.
Definition 3.15. The depth of a precedence structure is the maximum number of links between the root element and others plus 1.
As a special case, Nl (e , e ) = 0 for each e ∈ ≺P. LR(e, e′) is often abbreviated to L(e, e′) when R is clear from context. Definition 3.14. Let gg : =(A, P , ) be an RGG, ≺P the direct total precedence relations on P, E P a finite multiset, and R is a set of linking relations on E. The pair ps = (E , R) is a precedence structure on P, if the following conditions are satisfied:
• • • •
Proposition 3.1. The depth of a precedence structure is the height of the rooted tree it forms. The height of a rooted tree is the maximum of the levels of nodes, and the level of a node in a rooted tree is the length of the unique path from the root to this node. Definition 3.16. A precedence structure is complete if it forms a complete rooted tree, i.e., all the leaves of it are at the same level.
there is one and only one root element et ∈ E such that et r R r. D; ∀e ∈ E(e ≠ et → ∃r ∈ R(e ∈ r.D)); ∀e ∈ E(e ≠ et → L(et, e)); r, r R (r r r . D r . D = ).
Example 3.2. The pair ps1: =(E , R) is a precedence structure on the production set P of the RGG in Fig. 2.2, where:
• E = {e , e , e , e , e }, in which e = {p4, p9}, p8 , e = {p2, p9}, p9 , e = {p5}, p2 , e = {p2, p7}, p9 , and e = {p5}, p2 ; • R = {r , r , r }, in which r = (e , {e }, f ) with f (p4) = null and 1
The sole linking relation in R that has the root element et as the head is called the root linking relation, denoted as rt.
2
3
1
247
4
1
5
2
4
3
2
3
5
1
1
2
1
1
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
totally precedes p, denoted as M≺p, if T≺dp. The corresponding total precedence relation with respect to f is defined as M , p = (E { T , p }, R {r }) , where r = ( T , p , Et , f ) .
e1 p8
p4
A total precedence relation 〈M, p〉 is composed of a set of direct total precedence relations and a set of linking relations on it. These two sets can be partitioned into two parts: one is the two sets E and R, which are previously included in M; and the other is newly established, i.e., a direct total precedence relation 〈T, p〉, along with an associated linking relation r. Note that we do not make a distinction between the form of a total precedence relation and that of a partial precedence relation, since it is effortless to distinguish them by deciding whether or not the first element of it is a compound precedence set from context. Theorem 3.2. Let gg : =(A, P , ) be an RGG, M a compound precedence set and p ∈ P. If M≺p, then the corresponding total precedence relation 〈M, p〉 is a precedence structure.
p9
e4
p2
p9
p5
p2
p7
e3 p5
Proof. Suppose the compound precedence set M = (T , E , R) , and Et be the set of root elements of involved precedence structures. According to M , p = (E { T , p }, R {rt }) , Definition 3.18, where rt = ( T , p , Et , f ) is established in terms of the linking mapping f: T → Et∪{null}. From Definition 3.17, there is some m such that E = 1 i m Ei , R = 1 i m Ri , and psi = (Ei, Ri) are precedence structures with the root elements eti = (Pi , pi ) where 1 ≤ i ≤ m. It suffices to show that 〈M, p〉 satisfies the four terms listed in Definition 3.14. As to the first condition, let R = R {rt } . We select et = T , p as the root element. Obviously, et r R r. D. Second, let E = E {et } = 1 i m Ei {et } . For each e ∈ Ei such that e eti , there exists r ∈ Ri(e ∈ r.D), by Definition 3.14. Besides, eti Et = rt . D . To sum up, we get ∀e ∈ E′(e ≠ et → ∃r ∈ R(e ∈ r.D)). For each precedence structure psi, e Ei (e eti L (eti , e )) holds, from Definition 3.14. Since rt = (et , Et , f ) and eti Et , we get L (et , eti ) and e Ei (e eti L (et , e )) according to the first and second condition of Definition 3.13, respectively. So, ∀e ∈ Ei(L(et, e)). In addition, L(et, et) does not hold, for et r R r . D . Hence, e E (e et L (et , e )) . For each precedence structure psi, r , r Ri (r r r. D r . D = ) holds, by Definition 3.14. Since every element in the multiset E is unr Ri r Rj (i j r . D r . D = ) , where iquely identified, r Ri (r . D rt . D = ) , which is 1 ≤ i, j ≤ m. Furthermore, due to the fact that Therefore, rt . D = Et = 1 i m eti . r, r R (r r r. D r . D = ) . Consequently, we conclude that the pair 〈M, p〉 is a precedence structure. □
Fig. 3.2. The rooted tree that corresponds to a precedence structure.
f1 (p9) = e2 , f3 (p2) = e5 and f3 (p7) = null ;
•e =e. t
1
The tree that corresponds to ps1 is depicted in Fig. 3.2, where the three fragments enclosed by dashed red rectangles are the elements e1, e3, and e4, respectively. Definition 3.17. Let gg : =(A, P , ) be an RGG, psi: =(Ei, Ri) precedence structures with the root elements eti = (Pi , pi ) where P . The triple 1 ≤ i ≤ m, E = 1 i m Ei , R = 1 i m Ri and T (T, E, R) is a compound precedence set if the multiset 1 i m pi T . A compound precedence set consists of three parts: a multiset T of productions from set P, a multiset E of direct total precedence relations, and a set R of linking relations on E. The first part is the core of the triple, whereas the last two are its extension. That is, some elements in T are extended by corresponding precedence structures, from which the sets E and R are produced by uniting their elements and linking relations, respectively. Essentially, compound precedence sets are expected to be constituents for establishing total precedence relations. Example 3.3. The triple M1: =(T , E , R) is a compound precedence set on the production set P of the RGG in Fig. 2.2, where:
• T = {p2, p9}; • E = E E = {e , e , e }, E = {e } and E = {e , e }, in e = {p5}, p2 , e = {p2, p7}, p9 , and e = {p5}, p2 ; =R R = {r } , in which R = , R = {r } , and r = (e , • Rwith f (p2) = e and f (p7) = null ; • (E , R ) and (E , R ) are two precedence structures. 1
2
3
4
5
1
3
2
4
3
1
3
1
1
2
1
3
5
2
4
5
which,
5
2
3
3
4
{e5}, f3 )
Corollary 3.1. A total precedence relation forms a rooted tree.
3
2
Proof. It is straightforward from Theorems 3.1 and 3.2.
Proposition 3.2. Let be the set of all the compound precedence gg : =(A, P , ) , sets of an the binary relation :={a b|a. T = b. T , a, b } on is an equivalence relation.
□
Corollary 3.2. If ps: =( E , R) be a precedence structure with the root element et = T , p and the root linking relation rt, then the triple (T, E\{et}, R\{rt}) is a compound precedence set.
can be divided into more than |2P| equivalence Theoretically, classes in terms of the subsets of P, where |2P| is the cardinality of the power set of P. Nevertheless, only a minority of these equivalence classes have practical significance. This is due to the fact that a compound precedence set is expected to be the constituent of a total precedence relation, on condition that whose core is equivalent to the preceding set of some direct total precedence relation in ≺P. A compound precedence set M, together with a production p, forms a total precedence relation, on condition that a direct total precedence relation is established between the first part of M and p, which is formalized as follows. Definition 3.18. Let gg : =(A, P , ) be an RGG, M : =(T , E , R) a compound precedence set with Et the set of root elements of involved precedence structures, f: T → Et∪{null} a linking mapping and p ∈ P. M
Example 3.4. A total precedence relation. The compound precedence set M1: =(T , E , R) from Example 3.3 totally precedes the production p9, as T = {p2, p9} d p9 . There is a Et {null} = {e3, e4 , null} with f2 (p2) = e3 and linking mapping f2 : T f2 (p8) = e4 . The corresponding total precedence relation with respect to f2 is M1, p9 = (E , R ) , where E = {e3, e4, e5} {e2} , R = {r3} {r2} , e2 = {p2, p9}, p9 and r2 = (e2, {e3, e4}, f2 ) . By Corollary 1, it forms a rooted tree, which is exactly the right subtree (enclosed by the dashed green rectangle) of the rooted tree depicted in Fig. 3.2.
248
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
4. Contexts of productions
isomorphic mappings on which Xi are defined. Then, p equipped with (U, Z), i.e., a context-1 p, denoted as p(U, Z), is defined as p augmented by (U, Z) through the following steps.
In this section, based on partial and total precedence relations described in the preceding section, we present a formal definition of contexts, bestowed with their essential characterizations, and then illustrate the instantiation of contexts in derivation of graphs.
1. Add U simultaneously to both p.L and p.R. 2. Create two sets of edges Zl and Zr for p in the light of Z. Initially, Zl = Zr = ; then for each e ∈ Z, if s(e) ∈ Xi.V, create two edges e1 and e2 such that l (e1) = l (e2 ) = l (e ) , t (e1) = t (e2 ) = t (e ) , s (e1) = gi 1 (s (e )) , and t (e2) = h (s (e1)) (Here gi 1 is the inverse mapping of gi, and h is the bijective mapping from the set of marked vertices of the left graph to that of the right graph in p) and append e1 to Zl and e2 to Zr; otherwise if t(e) ∈ Xi.V, create another two edges e1 and e2 by exchanging the source and target of e1 and e2, respectively, and append e1 to Zl and e2 to Zr. 3. Connect the two separate parts of both graphs of p with the edges in Zl and Zr respectively, that is, (p. L). E = (p. L). E Zl , and (p. R). E = (p . R). E Zr . 4. Adjust the repeated marks of vertices in p to make it consistent with the marking principle of RGG productions.
4.1. Definition of contexts The sets of partial or total precedence relations with respect to a graph grammar establish an order of production applications, which can be exploited to discover potential situations in which any of the productions is applicable for derivation. We refer to these situations as contexts. Given two productions p1 and p2, if p1 directly partially precedes p2, then p1.R contains a context of p2 or merely a portion of a context, depending on whether p2.L consists of only one or at least two components. As for the former case, {p1}≺dp2 readily holds and a context of p2 immediately follows; whereas in the latter case, a subset of productions involving p1 that directly totally precedes p2 is pursued so as to form a complete context for p2. As a third case, a total precedence relation can be sought to build a rather deeper context. Contexts of a production can be stratified in terms of the levels of corresponding total precedence relations from which they are generated. Roughly, a complete context that is built in the light of a total precedence relation that corresponds to a rooted tree of depth i is called a level-i context, i ≥ 1, and it degrades to a level-1 context when the relation is a direct one. A context can be employed to extend the respective production to which it pertains. This is done by augmenting the context simultaneously to the both graphs and properly linking the two parts together respectively. A production p equipped with a level-i context is called a context-i p. Definition 4.1. Let p1, p2 be two productions of an RGG, C1 ∈ Mcc(p1.L) and C2 ∈ Mcc(p2.R). If ∃X⊆C2 such that X ∈ Rd(C2, C1), then (U, Z) is a level-1 context of C1 and also a partial level-1 context of p1 with respect to C1 and X, where U = C2 X and Z is the set of edges between X and U in C2. If a component of the right graph of one production contains a redex of a component of the left graph of another, then the result of removing the redex from the former component is a called a level-1 context of the latter component and a partial level-1 context of the whole production as well. Naturally, a complete level-1 context of a production is the collection of the partial contexts of its components, along with the union of sets of edges that connect them to respective redexes.
In the above definition, the context-1 p, p(U, Z), is generated by augmenting U simultaneously to p.L and p.R and joining the two parts together in each graph by reconnecting the edges in Z to the respective marked vertices that have one-to-one correspondence with those in the redexes to which they are originally connected. The involved correspondence relations come from the definitions of redex and production in RGG. A context-1 p is called a context-equipped production, or simply an extended production, of p. Trivially, any context-0 p is p itself. Definition 4.4. Let gg : =(A, P , ) be an RGG, and p ∈ P. If (U, Z) is a level-1 context of p, then (U, Z) is complete, and p equipped with (U, Z), i.e., context-1 p, is complete. Any context-0 p is complete. The notions in Definitions 4.2 and 4.3 might be generalized to “level-i context” and “context-i p”, respectively. To this end, we introduce some relevant notations beforehand. In a graph grammar (A, P, Ω), we denote by Ctpi the set of all the level-i contexts of p, and by Pdpi the set of all the context-i p, for some p ∈ P, where i ≥ 0. Trivially, Ctp0 = , and Pdp0 = {p} . We denote by Cti the set of all the level-k contexts of all the productions in P, i.e., i i Ct 0 = , Ct i = Ct i 1 p P Ctp , and P the union of P and the set of all i the context-k p’s, for all p ∈ P, i.e., P 0 = P and P i = P i 1 p P Pdp , where 1 ≤ k ≤ i. Pdpi with i ≥ 0, and Noticeably, the production p underlies any p conversely, each p′ contains an isomorphic image of p. The underlying production of p′ is denoted as Ud (p ) = p , and the isomorphic image of p in p′ is denoted as Im(p′). Moreover, the underlying set of any multiset P } . Readily, P P i , is defined as the multiset Ud (P ) = {Ud (p )| p there is a bijective mapping between P′ and its underlying set, and a bijective mapping between the two sets of components of their respective right graphs accordingly. Under the latter bijective mapping, Mcc (PR ) , the isomorphic image of any denoted as h: Mcc ((Ud (P ))R) C ∈ Mcc((Ud(P′))R) in h(C) is denoted as Im(h(C)). The two generalized notions are recursively defined as follows: gg : =(A, P , ) Definition 4.5. Let be an RGG, P i 1 with P P i 2 , i ≥ 2, and p ∈ P, Mcc (p. L) = {C1, …, Cn} , P h: Mcc ((Ud (P ))R) Mcc (PR ) a bijective mapping between the two sets of components of the right graphs of P′ and Ud(P′). If Ud(P′)≺dp with respect to some surjective mapping f : Mcc (p . L) St = {D1, …, Dm} Mcc ((Ud (P ))R) and a set of redexes X = {Xi |Xi Rd (h (f (Ci)), Ci ) Xi Im (h (f (Ci ))), 1 i n} . Then the pair (U, Z) is called a level-i context of p with respect to P′, f, and X, denoted as ctp(P′, f, X), where U = 1 j m D j , Dj = h (Dj ) kj Kj Xkj , Kj = {l|f (Cl ) = Dj 1 l n} , Z = 1 j m Zj , and Zj = {e h (Dj ). E| (s (e ) Xkj . V t (e ) Dj . V s (e ) Dj . V t (e) Xkj . V ) kj Kj} . If any production in P′ is context-(i 1) and complete, then (U, Z) is complete.
gg : =(A, P , ) Definition 4.2. Let be an RGG, p ∈ P, P . If P′≺dp with respect to some Mcc (p. L) = {C1, …, Cn} , and P St = {D1, …, Dm} and a set of surjective mapping f : Mcc (p . L) redexes X = {Xi |Xi Rd (f (Ci), Ci ), 1 i n} , then the pair (U, Z) is a level-1 context of p with respect to P′, f, and X, denoted as Dj = Dj kj Kj Xkj , U = 1 j m D j, ctp(P′, f, X), where Kj = {l|f (Cl ) = Dj 1 l n} , Zj = Z = 1 j m Zj , and {e Dj.E|(s (e ) Xkj . V t (e ) Dj . V ) (s (e ) Dj . V t (e) . The sets Xkj . V ) kj Kj } U and Z are called the contextual graph and contextual connection, respectively. Each component Dj of the contextual graph is the rest graph of Dj, a component of the right graph of some production that contains one or more redexes Xkj of the components Ckj , minus Xkj , and each Zj is the collection of edges in Dj that connect Dj to all the redexes Xkj . Trivially, any level-0 context of p is null. Definition 4.3. Let gg : =(A, P , ) be an RGG, notations p, P′, f, X, U and Z the same as in Definition 4.2 such that P′≺dp and (U, Z) is a level-1 context of p with respect to P′, f and X, and gi: Ci → Xi, 1 ≤ i ≤ n, the 249
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
This definition is almost the same as Definition 4.2, except for the meanings of P′, Xi, and the newly introduced notations. Two constraints are imposed on Xi. The first constraint describes that Xi is a redex of Ci in some component of Mcc (PR) that is probably a context-equipped version of f(Ci), a component of the right graph of a production in the underlying set of P′, and the second states that the redex Xi is a subgraph of the isomorphic image of f(Ci) in its image in Mcc (PR) under the mapping h. P i 1 with Definition 4.6. Let gg : =(A, P , ) be an RGG, P P P i 2 , i ≥ 2, p, f, X, U and Z the same as in Definition 4.3 such that Ud(P′)≺dp and (U, Z) is a level-i context of p with respect to P′, f and X, and gk: Ck → Xk, 1 ≤ k ≤ n, the isomorphic mappings on which X are defined. Then, p equipped with (U, Z), i.e., a context-i p, denoted as p(U, Z), is defined as the production p augmented by (U, Z) through the same four steps as in Definition 4.3. If (U, Z) is complete, then p(U, Z) is complete.
Base case. If k = 1, then p′ must be defined upon a direct total P , according to precedence relation e = P d p with P Definition 4.3. Then, we have e = Ud (P ) d p . Thus, E = {e} = { P , p } , and R = , by Definition 4.7. Hence, Ul(p′) is a precedence structure, according to Definition 3.14. Inductive step. The inductive hypothesis is that Ul(p′) is a precedence structure, for all k ≤ n, where n ≥ 1. If k = n + 1, then p′ must P n but P P n 1, from be defined upon some e = Ud (P ) d p with P Definition 4.6. Let P = {p1 , …, pl , pl + 1 , …, pm } with 1 ≤ l ≤ m such that pi∉P and pj ∈ P, where 1 ≤ i ≤ l and l + 1 j m , then, for each pi, Ul (pi) is a precedence structure, according to the induction hypothesis. Let Ul (pi ) = (Ei, Ri ) with eti the root element of,Ei , E = 1 i l Ei , Et = 1 i l eti , R = 1 i l Ri , and T = {Ud (p1 ), …, Ud (pm )} . By Definition 3.17, M = (T , E , R ) is a compound precedence set. Ac{e} , cording to Definition 4.7, Ul (p ) = (E , R) , where E = E R=R {r } , D0 is the subset of E such that each of its elements is not a tail of any linking relation in R′, i.e., D0 = Et , and f is the linking mapping from T to D0∪{null} such that f (Ud (pi )) = eti and f (Ud (pj )) = null . From Definition 3.18, M totally precedes p, i.e., M≺p, and the associated total precedence relation M , p = (E , R). Hence, (E, R) is a precedence structure, by Theorem 3.2. □
Example 4.1. Two contexts of a production and their respective extended productions. Two contexts at distinct levels of a production and their respective extended productions are demonstrated in Fig. 4.1. A level-1 context of p9 originated from the direct total precedence relation {p2, p9}≺dp9 is shown in Fig. 4.1(a), where the left component of the graph is R2 (the right graph of production 2), the right one is R9, the subgraph enclosed by the dashed green ellipse is the redex of L9 (the left graph of p9), and the context consists of two parts U and Z: U is the rest of the whole graph minus the redex and Z the set of bold red edges that connect U to the redex. The respective context-1 p9, numbered as p10, is depicted in Fig. 4.1(c), in which the subgraph surrounded by dashed green eclipses is the isomorphic image of the underlying production of p10. Fig. 4.1(b) shows a level-2 context of p8, which is created on the basis of the direct total precedence relation Ud({p4, p10})≺dp8, or equivalently, the total precedence relation Ul({p4, p10})≺p8. In this graph, the left and right components are R10 and R4, respectively, the subgraph enclosed by the dashed purple rectangle is the isomorphic image of Ud(p10), and the one surrounded by the dashed green eclipse is the redex of the left graph of p8. Similarly, the corresponding context2 p8, numbered as p11, is illustrated in Fig. 4.1(d).
The following notion is established on the basis of Proposition 4.1. Definition 4.8. Let gg : =(A, P , ) be an RGG, p ∈ P, and P = {p1 , …, pl , pl + 1 , …, pm } P i such that pj∉P and pk ∈ P, where 1 ≤ l ≤ m, i ≥ 1, 1 ≤ j ≤ l and l + 1 k m . The underlying structure of P′, denoted as Ul(P′), is a triple (T, E, R), where T = {Ud (p1 ), …, Ud (pm )} , E = 1 j l Ej, R = 1 j l Rj , and Ul (pj ) = (Ej, Rj ) . Proposition 4.2. Let gg : =(A, P , ) be an RGG. If P then Ul(P′) is a compound precedence set.
Proposition 4.3. Let gg : =(A, P , ) be an RGG, p ∈ P, i ≥ 1, and P P i . Ul(P′)≺p if and only if Ud(P′)≺dp. Proof. It is straightforward from Proposition 4.2, and Definitions 3.18 and 4.8. □ A substantial conclusion can be drawn from Proposition 4.3 and Definition 4.5. P i 1 with Theorem 4.1. Let gg : =(A, P , ) be an RGG, p ∈ P, and P P P i 2 , i ≥ 2. If Ul(P′)≺p, then there is a level- i context of p with respect to P′.
4.2. Characterizations of contexts In what follows, we proceed to present several characterizations of the above notions that naturally establish a relationship between level-i contexts and total precedence relations. gg : =(A, P , ) Definition 4.7. Let be an RGG, p ∈ P, P = {p1 , …, pm } P i 1, i ≥ 1, and p Pdpi . If p′ is defined upon e = Ud (P ) d p , then the underlying structure of p′, denoted as Ul(p′), is a pair of sets (E, R) recursively defined as follows:
A Corollary directly follows from Theorems 3.2 and 4.1. Corollary 4.1. Let gg : =(A, P , ) be an RGG, and p ∈ P. If ps is a precedence structure of depth i with p as the target production of its root element, there is a level- i context of p. Another Corollary, which integrates the cases of direct total precedence and total precedence into a whole, directly follows from Definition 4.2 and Theorem 4.1. P i 1 with Corollary 4.2. Let gg : =(A, P , ) be an RGG, p ∈ P, and P P P i 2 , i ≥ 1. If Ul(P′)≺p, then there is a level- i context of p with respect to P′.
• if i = 1, then E = {e}, and R = ; E , and R = {r } if i > 1, then E = {e} • else where Ul (p ) = (E , R ) , r = (e, D , f ) , D = {e E|e
1 k m Rk , e r r . D )} , and f: Ud(P′) → D0∪{null} is a linking map-
1 k m k
R (r r ping.
k
e
k
k
0
0
P i 1, i ≥ 1, Theorem 4.2. Let gg : =(A, P , ) be an RGG, p ∈ P, P Ul(P′)≺p, and (U, Z) a level- i context of p with respect to P′. If each production in P′ is context-(i 1) and complete, then (U, Z) is complete.
Trivially, for each production p ∈ P, Ul (p) = ( , ) . Noticeably, the linking mapping f always exists in the second case. This is due to the Pdpi is defined on Ud(P′)≺dp and i > 1, which guarantee that facts p Pdqi 1 defined on Ud(P″)≺dq for some q ∈ P there must exist some q i 2 P and P such that q ∈ Ud(P′), by Definitions 4.3 and 4.6. Thus, D0 ≠ ∅, and the target production of any element in D0 belongs to Ud(P′). Proposition 4.1. Let gg : =(A, P , ) be an RGG, p ∈ P and k . If p Pdpk , Ul(p′) is a precedence structure. Proof. We use mathematical induction to show if p precedence structure, for all k ≥ 1.
P i where i ≥ 1,
Proof. It is straightforward from Definitions 4.2, 4.4 and 4.5, and Proposition 4.3. □ 4.3. Instantiation of contexts In a graph grammar, a context instance is an instantiation of a context of a production in a graph that can be derived from the initial graph. This subsection establishes a connection between contexts and context instances.
Pdpk , Ul(p′) is a 250
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
1:T if B
1:T if B
1:T send 2:B
T Stat B
T Stat B
6:T send 7:B
T send B
T Stat B
T Stat 2:B
3:T 5 receive 4:B R9
T endif 2:B R2
1:T Stat B
T endif 2:B
3:T 5 receive 4:B
(a)
(b)
1:T if B
1:T if B
6:T send 7:B
T Stat B
T Stat B
:=
3:T 5 receive 4:B
T endif 2:B
T Stat B
6:T send 7:B T send B
T endif 2:B
3:T 5 receive 4:B
(c)
1:T if B
6:T send 7:B
8:T Stat B
T send B
T Stat B
1:T if B
:=
6:T send 7:B
8:T Stat B
T send B
T Stat B
T Stat 9:B T endif 2:B
T receive 9:B T endif 2:B
3:T 5 receive 4:B
3:T 5 receive 4:B
(d) Fig. 4.1. The contexts of a production and their respective extended productions. (a) A level-1 context of p9. (b) A level-2 context of p8. (c) A context-1 p9. (d) A context-2 p8.
• i < j.
pn G is a Assume that gg : =(A, P , ) is an RGG, A = G0 p1 G1 n derivation with each pi ∈ P and 1 ≤ i ≤ n, S⊆Gj, and 0 ≤ j < l ≤ k ≤ n. If S Gj Gj + 1, i.e., the subgraph S has not been changed in the derivation step of Gj + 1 from Gj, then S is called being preserved in Gj + 1, denoted as S prs Gj + 1, or simply S Gj + 1 when its meaning is clear from context; if S is preserved in the derivation of Gk from Gj, then S⊆Gl, for each l. Definition 4.9. Let gg : =(A, P , ) be an RGG, p ∈ P, and G a graph such that A → *G. If C⊆G∧C ∈ Rd(G, p.L), Z = {e G . E|(s (e ) C . V t (e ) G . V s (e ) G . V t (e ) C . V )} , and G =G C , then (G′, Z) is a context instance of p.
Readily, the successor relation is irreflexive, anti-symmetric, and transitive. Recall that any two elements in the multiset Q have different identities, even though they refer to a same production. Definition 4.11. Let (Q, ⋞) be a successor relation on a multiset Q with respect to a derivation q1, …, qn of G as in Definition 4.10. The rooted tree Tr that (Q, ⋞) forms is recursively constructed in two steps:
• create a rooted tree Tr with q as the root node; and iteratively • for each leaf node q in Tr such that q ≠ q , create a subtree in Tr n
1
with q as the root node as follows: for each q′ ∈ Q, if q′⋞q, create a node q′, and connect it to q with a directed edge from q to q′.
Definition 4.10. Let gg : =(A, P , ) be an RGG, 1 ≤ i ≤ n, and A q1, … , qn G with each qi ∈ P. A successor relation ⋞ on the multiset Q = {q1, …, qn} with respect to the derivation of G is the set (Q, ⋞) such that any production pair (p, p′) ∈ ⋞, denoted as p⋞p′, where p = qi , p = qj , and 1 ≤ i, j ≤ n, if the following two constraints are satisfied:
Definition 4.12. An RGG gg : =(A, P , ) is well-formed if ∀p, p′ ∈ P∀C ∈ Mcc(p′. L)(|C.N| ≥ 2), for any merger G ∈ Mrg(p.R, C) such that the common subgraph B = p . R C C and B⊊︀p.L∩p.R, ¬ G such that G p G B ¬ G.
• p≼ p′, that is, there is a step in the derivation such that the direct
A well-formed graph grammar guarantees that any component of a redex of a production's left graph must be a subgraph of some component of a production's right graph. In the RGG formalism, the context is
d
total precedence relation to which it corresponds is P′≺dp′ and p ∈ P′, P; for some multiset P 251
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
not explicitly involved in a production as ordinary graph elements. Instead, the context elements are compactly represented as vertices that are embedded into ordinary nodes of a production. Thus, each node or edge in a production is an indispensable constituent for characterizing the graph transformation it is expected to express. Consequently, each component of a production can be viewed as an independent common substructure and the connection between it and the associated context is usually disregarded, from the perspective of grammar designers. Moreover, as for the language of a graph grammar, any common substructure of its sentences must frequently be some subgraph of a production's right graph, and each component of this production's left graph will in turn be a subgraph of another's right graph in most cases. Hence, we believe most of the graph grammars that belong to the RGG formalism must be well-formed. The RGG instances that we have so far encountered in the literature are well-formed, including the one depicted in Fig. 2.2. The instantiation of a context with respect to a production in a wellformed graph grammar is given below: Definition 4.13. Let gg : =(A, P , ) be an RGG, p ∈ P, qn G A = G0 q1 G1 n such that qm ∈ P for 1 ≤ m ≤ n, and c a context of p. The graph Gn is an instantiation of c with respect to some graph Gl, called the base graph, and the derivation d = q1 qn of Gn, where 0 ≤ l ≤ n, denoted by Inst (Gl, d , c ) = Gn , if one of the following two cases are satisfied:
We denote the redex by S″. Therefore, S′\S⊊︀q.L∩q.R, and S′\S⊆S′∩S″. Noticeably, S′∪S″ is a merger of C and q.R. Conversely, the subgraph graph S′∪S″ in H′ can be reduced by applying an R-application of q to the subgraph S″, and the resulting subgraph in H does not contain the subgraph S′\S. This contradicts the constraint in Definition 4.12. Consequently, for any C ∈ Mcc(qi.L), there exists a subgraph S such that either S Gi 1 S Rd (Gi 1, f (C )) , or there exists some j, 0 j < i 1, and S⊆Gj∧S ∈ Rd(Gj, f(C)), satisfying S S (S Rd (S , C ) S Gi 1) and for any C, C Mcc (p . L)(C C f (C ) = f (C )) , their redexes in Gi 1 do not overlap, where P′≺dp is the corresponding direct total precedence reMcc (PR ) . lation of the redex with respect to the mapping f : Mcc (p . L) This assertion will be used below. Then, we generate the rooted tree that (Q, ⋞) forms according to Definition 4.11, and denote it by Tr. Noticeably, a successor relation, say q⋞q′, will be used as many times as q′ appears as a leaf node in the construction process of Tr. That is, some successor relations will be repeatedly applied while others can never be chosen. Moreover, we denote the height of Tr as dtn + 1. Next, we create a precedence structure psn + 1 = (E , R) according to Tr, where each e ∈ E is a subtree of height 1 in Tr, representing a direct total precedence relation; and each linking relation r ∈ R is the relation between one and a set of direct total precedence relations, in terms of the inverse process of the tree construction steps described in Theorem 3.1. Clearly, psn + 1 is of depth dtn + 1 and qn+ 1 is the target production of the root element according to Proposition 3.1. Hence, it corresponds to a level-dtn + 1 context of qn+ 1, from Corollary 4.1. A natural generalization of the above process can be achieved in this qm G , 1 m n + 1, a way: for any derivation A = G0 q1 G1 m corresponding precedence structure psm of depth dtm, can be created accordingly. Finally, we proceed by mathematical induction on the depth of psm to show that there is a context c of qm and a graph Gl, where 0 l m 1 and d = q1 qm 1 is the derivation of Gm 1, such that Inst (Gl, d , c ) = Gm 1 for all dtm ≥ 1. Base case. If dtm = 1, then psm = ({et }, ) and c is a level-1 context with respect to the direct total precedence relation et. Obviously, it corresponds to the derivation step Gm 1 qm Gm . From the assertion already made in the first step, there must exist some k, 0 k m 1, such that for any C ∈ Mcc(p.L), the constraint S Gk (S Rd (Gk, f (C )) S S (S Rd (S, C ) S Gm 1)) holds. Otherwise, there must exist some C′ ∈ Mcc(p.L) such that ¬ S Gk (S Rd (Gk , f (C ))) , and k < m 1. As Gm 1 contains a m 1, redex of C′, say S0, there must exist some k′ such that k < k G k (S Rd (G k , f (C )) S0 S ). That is, there is a and S derivation of positive length beginning in Gk and ending in G k that generates the subgraph S‴ which contains that S0. According to the second and third steps, the precedence structure psm such created must be of depth at least 2. A contradiction occurs. In addition, it naturally Mcc (p . L)(C C f (C ) = f (C )) , their reholds that for any C , C dexes in Gm 1 do not overlap. Hence, from Definition 4.13, Inst (Gk , d , c ) = Gm 1. Inductive step. Let us assume that the conclusion holds for all k < dtm. If k = dtm , then c is a level-dtm context with respect to the root element et = P d qm , for some P′ and f. Suppose psm = (E , R ) with the rt = (et , D , h). root linking relation By Corollary 3.2, (P′, E′\{et}, R′\{rt}) is a compound precedence set, denoted as M. According to Definition 3.17, for each p′ ∈ P′, there are two cases to be discussed. D , then there is a precedence In the first case, h (p ) = P , p structure with 〈P″, p′〉 as the root element in M and we denoted it by ps′. Let p′ be qm , then m′ < m, and ps′ corresponds to the derivation A = G0 q1 G1 Gm 1 qm Gm . According to the hypothesis, there 1 and is a context c′ of qm and a graph Gl , where 0 l < m d = q1 qm 1 G m 1, is the derivation of such that Inst (Gl , d , c ) = Gm 1. Accordingly, Gm contains a redex of qm . R ,
• the context c is a level-1 context with respect to P′, f, and P′≺ p as in d
•
Definition 4.2 such that: (i) there exists some k, 0 ≤ k ≤ n, satisfying that for any C ∈ Mcc(p.L), the constraint ∃S⊆Gk(S ∈ Rd(Gk, f(C))∧∃S′⊆S(S′ ∈ Rd(S, C)∧S′⊆Gn)) holds, and (ii) for any C, C Mcc (p . L)(C C f (C ) = f (C )) , their redexes in Gn do not overlap, where l = k ; the context c is a level-i context with respect to P′, f, and Ud(P′)≺dp as in Definition 4.5 such that for any C ∈ Mcc(p.L) (Let f(C) be some component of the right graph of a context-j production in P′ whose context is c′ and 0 ≤ j < i), there exists some k, 0 ≤ k ≤ n, satisfying the constraint in (i), where 0 ≤ l′ < k, Inst (Gl , d , c ) = Gk , and d = q1 qk is the derivation of Gk on condition that j > 0, and (ii) holds, where l is the minimum value of k.
This concept is established on the basis of the notions of level-i context for all i ≥ 1, introduced in Definitions 4.2 and 4.5. The first condition ensures that each component in Mcc(p.L) possesses a redex in Gn that is a subgraph of a component of some production's right graph appearing in Gk, which is recursively an instantiation of some level-j context c′ with j < i, when j > 0; and the second condition guarantees the existence of a redex of p.L, that is, any two components of the redex do not overlap with each other. Theorem 4.3. Let gg : =(A, P , ) be a well-formed RGG, qn + 1 G A = G0 q1 G1 m n + 1, and d = q1 qn n + 1, qm ∈ P for 1 the derivation of Gn. If qn+ 1 = p, then there exists a context c of p and a graph Gl with 0 ≤ l ≤ n such that Inst (Gl, d , c ) = Gn . Proof. First, according to Definition 4.13, we can establish a successor relation ⋞ on the multiset Q = {q1, …, qn + 1} with respect to the derivation of Gn + 1, and denote it by (Q, ⋞). Note that in each derivation step Gi 1 qi Gi , 1 i n + 1, there is a redex of qi. L in Gi 1. Therefore, a set of partial precedence relations that comprises a total one is added to the set ⋞ in each step. As the graph grammar is well-formed, each component of a redex must appear as a subgraph of a component of some production's right graph. We show this assertion with proof by contradiction. Assume that ∃p′ ∈ P∃C ∈ Mcc(p′. L) with |C.N| ≥ 2 such that there exists a redex of C in a host graph but it is not a subgraph of any production's right graph. Then, there must exist such a derivation A → *H → qH′, satisfying ∃S⊆H∃S′⊆H′(S⊂S′∧(S′\S)⊊︀H∧S′ ∈ Rd(H′, C)). So, there is a redex of q.R in H′, which accounts for the appearance of the subgraph S′\S in H′. 252
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
which is composed of the redexes of all the elements in Mcc (qm . R) . We denote it by G′. G′ is unique, as qm is indicated by the subscript m′, a number distinct from that of any others. It is known from Definition 3.9 that there exists some C0 Mcc (qm . L)(f (C0) Mcc (qm . R)) . Thus, for f (C ) Mcc (qm . R), such that each C ∈ Mcc(qm. L) S G m (S Rd (G m , f (C ))) . Moreover, if S Gm 1 is the redex of C P , such that with respect to et = P d qm , where qm f (C ) Mcc (qm . R) , then S′ must be a subgraph of S in G′ that is preserved in the derivation step of Gm 1 from Gm , as G′, the redex of qm . R , S (S Rd (S , C ) S Gm 1) holds. is unique. Consequently, S Therefore, the constraint in (i) of Definition 4.13 is satisfied. The condition (ii) readily holds, as Gm 1 contains a redex of qm. k 1. By On the other hand, let ps′ be of depth k′, then 1 k Corollary 4.1, it corresponds to a level-k′ context of p′, which can be uniquely determined b y the derivation q1 qm 1 of Gm 1. We denote it by c′. According to Definition 4.6, we can create a context-k′ p′ with c′, and call it pk . Therefore, Ud (pk ) = p . In the second case, h (p ) = null , then for each C ∈ Mcc(qm. L), if S Gm 1 is the redex of C with respect to P′≺dqm, where p′ ∈ P′, such that f(C) ∈ Mcc(p′. R), there must exist a subgraph S⊆G0(S ∈ Rd(G0, f(C))∧S′⊆S), because the derivation d = q1 qm 1 has nothing to do with the generation of the redex of p′. R. Thus, S′ ∈ Rd(S, C). Hence, the constraint in (i) of Definition 4.13 is satisfied. Readily, the condition (ii) holds too. Then, we create a multiset set Q as follows: initially, Q = ; and for each p′ ∈ P′, if h(p′) ∈ D, then Q = Q {pk } , otherwise Q = Q {p } . So, Ud (Q ) = P . Furthermore, we establish a mapping f′: Mcc(qm. L) → Mcc ((Ud(Q))R) such that for each C ∈ Mcc(qm. L), f (C ) = f (C ) . Consequently, c is a level-dm context with respect to Q, f′, and Ud(Q) ≺dqm such that the two conditions are satisfied. Therefore, it holds that there is a context c of qm and a graph Gl, where d = q1 qm 1 is the derivation of Gm 1 and 0 l < m 1, such that Inst (Gl, d , c ) = Gm 1 for all dm ≥ 1, according to Definition 4.13. Hence, we arrive at the conclusion by simply letting m = n + 1. □
Corollary 4.3. Let gg : =(A, P , ) be a well-formed RGG, A = G0 * Gl Gn p Gn+ 1, p ∈ P, c a level-1 context of p, d the derivation from Gl to Gn, and Inst (Gl, d , c ) = Gn , where 0 ≤ l ≤ n. If Gn is a minimal instantiation of c, then l = n . Definition 4.16. Let gg : =(A, P , ) be an RGG. A split version of P, denoted as Pst, is created as follows: Initially, P st = ; for any p ∈ P and any C ∈ Mcc(p.L), let D⊆p.R be the maximal subgraph including the same marked vertices as C such that each component in D involves at least one marked vertex, and adds the production (C, D) to Pst if D exists. Definition 4.17. Let gg : =(A, P , ) be an RGG, the set Pst the split version of P, and G a graph isomorphic to contextual graph U of a context c = (U , Z ) . If G → *G′ is a derivation of positive length using the productions in Pst∪P, then c = (G , Z ) is called a derivative context of c. A derivative context is a variation of a context with respect to a derivation starting at this context. Proposition 4.4. Let gg : =(A, P , ) be an RGG, p ∈ P and c a level-1 context of p. For any context instance c = (U , Z ) , there is a derivative context c = (U , Z ) of c such that U′⊆U. Proof. It is straightforward from Definitions 4.9, 4.16 and 4.17, and Corollary 4.3. □ The rest graph of a context instance subtracting its derivative context, i.e., U\U′, is less relevant to the redex of the involved production, as the redex is immediately connected to or even surrounded by this context. Some remarks on contexts and context instances are necessary for applications discussed afterwards. Essentially, a context of a production is a hierarchical structure of iterations among the productions of the involved graph grammar in light of a total precedence relation that forms a situation in which the production can be applied, whereas a context instance is an actual situation in a graph generated from a derivation from the initial graph in which the production can be applied for derivation, or conversely, for parsing. That is, an iteration hierarchy of the productions involved in a derivation that is explicitly illustrated in a context becomes invisible in any process of instantiation, which produces a context instance. It is clear from Theorem 4.3 that a context usually corresponds to a large number of context instances. Therefore, it is inefficient to enumerate all the context instances that correspond to a certain context in applications. As a matter of fact, it is also dispensable to carry out those computations due to two reasons. On the one hand, a context is even more appropriate than the set of all its instances in some applications, such as comprehension of graph grammars. On the other hand, a context, together with its derivative contexts, can be substituted for all its instances in many application scenarios. A typical example in this regard is level-1 contexts. For any instance of a level-1 context, there is a derivative context that is contained as a subgraph in the instance, and it is more relevant to the redex than the rest part of the instance. Therefore, it is contexts rather than context instances of a graph grammar that are of theoretical and practical significance.
This theorem not only reveals the relationship between contexts and context instances, which states that for each context instance of a production in a graph grammar there exists an associated context such that the former is merely an instantiation of the latter, but also offers a practical way to construct this context with the given context instance and its derivation process. Generally, there are a large number of graphs that are instantiations of a given context pertaining to a production. If a graph is an instantiation of a context of a production in terms of such a derivation that each step either directly or indirectly participates in the generation of the redex of this production's left graph, then the graph is called a minimal instantiation of the context. Definition 4.14. Let gg : =(A, P , ) be an RGG, and A q1, qn G with qi ∈ P and 1 ≤ i ≤ n. The graph G is minimal with respect to a designated subgraph C⊆G if there does not exist a reordered derivation r1, …, rn such that A r 1, … , rn G , A r 1, … , r k G , and C is preserved in the derivation of G from G′, i.e., C⊆G′, where 0 ≤ k < n. Definition 4.15. Let gg : =(A, P , ) be a well-formed RGG, qn + 1 G A = G0 q1 G1 m n + 1, qn+ 1 = p, n + 1, qm ∈ P for 1 d = q1 qn the derivation of Gn, c a context of p, and Inst (Gl, d , c ) = Gn , where 0 ≤ l < n. If Gn is minimal with respect to the subgraph S⊆Gn such that S ∈ Rd(Gn, p.L) is the redex used in the derivation step Gn qn + 1 Gn+ 1, then Gn is a minimal instantiation of the context c.
5. Applications for contexts In this section, we apply contexts to three issues in this field: comprehension of graph grammars, construction of locally confluent set of productions, and improvement of parsing efficiency. Concrete solutions with the utilization of contexts are presented to the first two issues, whereas only abstract strategies for exploiting contexts are outlined to the third, as more detailed description of the approach is beyond the
Apparently, if a graph G is an instantiation of a context c, then it is effortless to generate from G the corresponding context instance of c, according to Definition 4.9. A special case of instantiation directly follows from Theorem 4.3 and the above two definitions.
253
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
scope of this paper.
5.2. Construction of confluent set of productions The time complexity of the parsing algorithm is a crucial criterion for judging whether a graph grammar is applicable or not. Generally, locally confluent graph grammars possess highly efficient parsing algorithms with polynomial time complexity. However, any contextsensitive graph grammar that includes a couple of productions whose right graphs are the same in the sense of isomorphism and left graphs are distinct can not be confluent, such as the typical graph grammar for specify process flow diagrams, as depicted in Fig. 2.2. Fortunately, contexts offer to some extent a feasible approach to generating a locally confluent context-sensitive graph grammar from an originally non-confluent one such that the latter's language is preserved in the former. The basic idea of the approach is to replace the productions of a graph grammar that lead to its non-confluence with corresponding tailored context-equipped productions. A procedure will be executed in advance to determine whether a graph grammar is confluent or not, and return the pairs of productions that cause its nonconfluence, if it is not. The algorithm framework of the approach is described in Algorithm 5.1. The involved notions are described as follows. A graph grammar is locally confluent if any pair of productions from its production set is compatible; and a pair of productions p1, p2 is compatible if any merger H of their right graphs is reducible with respect to them, i.e., if H p1 H1 p2 H12 and H p2 H2 p1 H21, then H12 ≈ H21. The critical part of the algorithm is the process of context tailoring. It is conducted in accordance with three criteria: (1) only level-1 contexts of productions will be dealt with; (2) wildcards can be used to facilitate context description when necessary; (3) context tailoring can only be conducted on the basis of the left graph of a production (this is due to fact that the notion of context is defined based on the left graphs of productions). The process mainly consists of five steps. First, it prepresses the level-1 contexts of two productions by reducing them to depth-1 contexts. A level-1 context is reduced to a depth-1 one if only those nodes neighboring to the nodes of the production and incident edges are preserved. Second, it finds the common subgraph C of the right graphs of p1 and p2, and deletes the portion of contexts that pertains to the counterparts of p1.R\C1 in p1.L and p2.R\C2 in p2.L, where C1 and C2 are the marked subgraphs isomorphic to C in p1.R and p2.R, respectively. A counterpart of a subgraph in the right graph of a production is the subgraph in the left graph to which it corresponds in terms of the marked vertices. Noticeably, the subgraph C cannot be empty, or else, p1 and p2 must be compatible. Third, it determines what the relation between C1 and C2 is, and chooses a corresponding tailoring strategy. There are. four kinds of re-
5.1. Comprehension of graph grammars Graph grammar comprehension is an essential prerequisite both for designers to make sure that the desired language is exactly expressed by the designed graph grammar, and for users to precisely understand what is the language of a legacy graph grammar in applications. Generating the language is viewed as a regular way to comprehend graph grammars. However, it seems infeasible to describe the language set of a graph grammar by enumerating its members, because the set is always infinite and most of its elements come from lengthy derivation steps. Contexts offer an alternative way for designers or users to grasp an implicit graph grammar by directly observing the productions instead of enumerating the members of the language. A context of a production characterizes a potential circumstance, under which it can be applied for derivation, a means usually employed to generate members of the language. Conversely, the context can also be regarded as a circumstance under which the production can be applied for parsing. Any production is self-explanatory for what it is for, whereas the contexts at distinct levels indicate at which situations it can be applied. These two aspects together clearly show the intension or meaning of a production, from the point view of derivation. Consequently, contexts can facilitate the comprehension of a graph grammar, i.e., a set of productions, by synthesizing the meanings of its constituents so as to form the overall characteristics of the members of its language. In this sense, contexts bridge the gap between an implicit graph grammar and its language. Moreover, contexts are helpful to implicit graph grammar designers. It is known that the design of an appropriate implicit graph grammar is a challenging work even for experts. For example, to invent a set of RGG productions, the designer first creates the main part of the left and right graphs, and then arranges vertices in each node and determines which vertices should be marked. As is known, vertices are the connecting points of edges, but their exact meanings are ambiguous. Thus, the designer is supposed to conceive the possible nodes and incident edges that might be connected to a marked vertex, which is undoubtedly a hard task. Contexts can facilitate this design process. Once a set of productions is preliminarily created, we can compute the contexts of them and check if the nodes to which a marked vertex connects are expected and the arrangement of vertices in each node is proper or not, and refine the design accordingly. This process can be repeatedly conducted until a satisfactory design is achieved. In this way, a real-time feedback system with human-machine interaction can be implemented to help implicit graph grammar designers. It is noticeable that only complete contexts at various levels, together with their derivative contexts when necessary, need to be computed for the purpose of grammar comprehension, as any incomplete context must be contained in some complete one at the same level. Example 5.1. Some selected contexts of the production p3.
· C2 , C1 > C2 , and C1 =· C2 . The relation lations between them: C1<· C2 , C1 > C1<· C2 holds if any subgraph of any host graph that is a redex of C1 is also a redex of C2 and there is a subgraph of a host graph that is a redex > · C2 is the reverse of C1<· C2 ; C1 < C2 if there of C2 is not a redex of C1; C1 > is a subgraph of a host .graph that is a redex of C1 is not a redex of C2, <
Some of the contexts of the production p3 are illustrated in Fig. 5.1, where all the level-1 contexts are shown in (a)-(f), and 5 level-2 contexts are shown in (g)-(k). In each graph, the dashed node is the redex of the left graph of p3, the rest subgraph is the contextual graph U, and the set of bold red edges is the contextual connection Z. In each level-2 context, the subgraph enclosed in the dashed blue eclipse except for the redex of the left graph of p3 (i.e., the dashed node) is the first level of the context, and the subgraph outside of the dashed blue eclipse is the second level of the context. Only a few of the total 35 level-2 contexts are depicted, as others can be effortlessly derived in a similar way. With contexts at various levels, it is attainable for users to figure out the characteristics of potential situations for applying the production.
and vice versa; and C1 =· C2 if any subgraph of any host graph that is a redex of C1 is also a redex of C2, and vice versa. In the first case, it drops the portion of contexts of the counterpart of C1, and preserves that of the counterpart of C2. The second case is tackled in a way opposite to that in the first case. In the latter two cases, it preserves the portion of contexts of the counterparts of both C1 and C2, that is, nothing need to be done. Fourth, according to the marking technique of RGG, it figures out the two subgraphs in p1.L and p2.L (denoted as D1 and D2) that are the counterparts of C1 and C2, respectively. If D1∩D2 ≠ ∅ and there is a component D′⊆D1∩D2, then it drops the portion of contexts of D1 and that of D2 , where D1 and D2 are the marked subgraphs isomorphic to D′ in p1.L and p2.L, respectively.
254
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
1:T if B
begin B T Stat B
T Stat B
T Stat B
T end
T endif 2:B
(a)
(b)
1:T Stat B
T Stat 2:B
T Stat 2:B
(c)
(d)
T Stat B
1:T fork 2:B T Stat B
T Stat B 3:T join 4:B
(e)
(f) 1:T fork B
T Stat B
T Stat B
T Stat B
T Stat B
T endif 2:B
T endif 2:B
T Stat B
1:T if B
1:T if B
1:T if B T Stat B
T Stat B
T join 2:B
1:T if B
begin B
T Stat B
1:T Stat B
1:T fork B
1:T if B
T Stat B
T endif 2:B
T end
T endif 2:B
T Stat 2:B
(g)
(h)
(i)
1:T if B
T Stat B
T Stat B
T Stat B
T Stat B
T endif 2:B
T Stat B T endif 2:B
T join 2:B
(j)
(k)
Fig. 5.1. Some selected contexts of the production p3. (a)–(f) are level-1 contexts, and (g)–(k) are level-2 contexts.
and their derivatives as a whole can be represented as a wildcard. From the perspective of time complexity, the above algorithm is rather complicated1. Nevertheless, it is feasible in practical applications due to that the parameters that characterize an ordinary graph grammar are usually small constants, which cannot change in derivation or parsing; and the worst cases theoretically assumed can barely happen in practice. Proposition 5.1. Let gg : =(A, P , ) be an RGG. Algorithm 5.1 with P as the input returns P′ and gg : =(A, P , ) . Then the language of gg is the same as gg′, i.e., L (gg ) = L (gg ) .
Algorithm 5.1 Construction of a locally confluent graph grammar. Input. A set P of productions and associated set NP of pairs of productions that cause its non-confluence. Output. An equivalent confluent set P′ of productions, if applicable; and failure, otherwise. { P = P; while (NP ≠ ∅){ Randomly select a pair, say (p1, p2), from NP; if (p1, p2 is compatible) NP = NP {(p1 , p2 )} ; else { Compute all the level-1 contexts of p1 and p2, respectively; Tailor the contexts of p1 and p2, according to the context tailoring process; Create two context-equipped productions p1 and p2 ;
Proof. Let G be any graph such that G ∈ L(gg), then there is a derivation A → *G in gg. Let the length of the sequence be l. We show by induction that there exists a derivation of G in gg′ with the same length l, whenever l ≥ 0. Basis case. If l = 0 , A → *A is also a derivation in gg′. Inductive step. Assume that there is a derivation of G in gg′ when l = n . If l = n + 1, then A → *G′ → pG is a derivation in gg, where A → *G′ is of length n. According to the induction hypothesis, there is a derivation A → *G′ of length n in gg′. Let C⊆G′ be the redex of p.L used in the derivation step G′ → pG and q the production in P′ to which p corresponds. Then, there must exist a subgraph C′⊆G′ such that C⊆C′ and C′ is a redex of q.L, that is, C′\C matches one of the tailored contexts and their derivatives of p that are represented as a wildcard in q, because the wildcard enumerates all the possible depth-1 contexts and their derivatives of p in a tailored form. Thus, G′ → qG holds, for q fulfills the same transformation as p does. So, A → *G′ → qG is a derivation of length n + 1 in gg′. Therefore, we have G L (gg ) . Hence, L (gg ) L (gg ) . Conversely, it suffices to show that ∀G ∈ L(gg′), G ∈ L(gg) holds. This can be achieved in a similar way. □
Generate all the derivative contexts of p1 and p2 ;
if ( p1 , p2 is compatible) {
NP = NP {(p1 , p2 )} ;
Update p1 and p2 in P′ and NP with p1 and p2 ;
}
}
} else return failure;
} if (P′ is confluent) return P′; else return failure;
Finally, it reduces the number of existent contexts as much as possible by eliminating redundancy. A context is redundant if it is included in another context, i.e., any subgraph of a host graph that is a redex of the former is also a redex of the latter. A wildcard will be used when the number of contexts is more than one. Anther part of the algorithm worthy of noticing is the process for generating derivative contexts. Note that the maximal number of derivation steps required for generating necessary derivative contexts is no more than the cardinality of the split version of P. If a context c′ in p1 (or p2 ) is tailored from a level-1 context c, then any derivative context dc of c, can be tailored to dc in the same way as c′. The tailored contexts
1 The worst-case time complexity of Algorithm 5.1 is O (m2nnr r 2nr ((mn)n + nnr )) , where m is the number of the productions in P, n is the maximal number of components in the left or right graphs of the productions in P, and r is the maximal number of nodes in the components.
255
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
begin B
1:T send 2:B
begin B
T Stat B
begin B
1:T send 2:B
T Stat B 3:T 5 receive 4:B
T end
(a)
T Stat 2:B
3:T 5 receive 4:B
(e)
3:T receive 4:B
T Stat B 3:T receive 4:B
T end
(b) 1:T send 2:B
1:T Stat B
T Stat B
T end
1:T fork B
T Stat B 3:T 5 receive 4:B
T endif 2:B
3:T 5 receive 4:B
(d)
(g)
1:T send 2:B T Stat B
T Stat B 3:T 5 receive 4:B
T join 2:B
(f)
1:T fork 2:B
1:T send 2:B T Stat B
T Stat B
T Stat 2:B
1:T send 2:B
(c)
1:T send 2:B
1:T Stat B
1:T if B
1:T 5 send 2:B
3:T 5 receive 4:B
3:T join 4:B
(h)
Fig. 5.2. Some selected level-1 contexts of p9. (a)–(c) are contexts with fixed left portion and variable right portion, and (d)–(h) are contexts with variable left portion and fixed right portion.
Example 5.2. An application of Algorithm 5.1.
edges between them is the overlapping of the right graphs of p8′ and p9′, and the union of this subgraph and R8′ or R9′ together with the incident edges between them is a redex of p8′. R or p9′. R, respectively. In each redex, the red edge matches the bold red edge of the corresponding right graph in (e) or (f). By substituting p8′ and p9′ for p8 and p9, respectively, we obtain an updated NP, including three pairs of productions (p7, p8′), (p7, p9′) and (p5, p6). It is readily to verify that the first two pairs are compatible. Hence, the first three pairs of the original NP composed of p7, p8 and p9 have so far been transformed into compatible ones. As a matter of fact, the process of context tailoring with the three pairs as input in whichever order is bound to produce the same output. Next, the while-loop of the algorithm with (p5, p6) as the selected pair creates a new compatible pair of productions (p5, p6′). The contextequipped p6, i.e., the intermediate version of p6′ prior to the generation of derivative contexts is depicted in Fig. 5.4(a), and the final version of p6′ is given in (b) with 36 of totally 60 elements of the wildcard listed in (c). Apparently, the subgraphs in the dashed green rectangles in (a) or (b) are the left and right graphs of p6. The wildcard is a set that enumerates the context and all the tailored derivatives that contain one or two nodes connected to the two incident edges. Some representative mergers of their right graphs are illustrated in (d)-(f). Clearly, they are reducible with respect to p5 and p6′, because none of them contains both a redex of p5.R and a redex of p6′.R simultaneously. For example, the whole graph in (d) is a redex of p6′.R, whereas the subgraph enclosed in the dashed red rectangle is not a redex of p5.R; the subgraphs in the dashed red eclipse and rectangle in (e) are not redexes of p6′.R and p5.R, respectively; and the subgraph in the dashed green rectangle is a redex of p6′.R, while the one in the dashed red rectangle is not a redex of p5.R. The last step checks whether the new production set is locally confluent or not. To this end, it suffices to examine only those pairs that consist of at least one newly created production, minus those that have already been verified. Each new production that involves wildcards is viewed as an ordinary production. In this example, P′, the result of updating P with the three newly created productions p6′, p8′ and p9′, is confluent. The new production set is dedicated to parsing.
The graph grammar depicted in Fig. 2.2 is not locally confluent, as it involves four incompatible pairs of productions: (p8, p9), (p7, p8), (p7, p9) and (p5, p6). We choose the first pair (p8, p9) to run the Algorithm 5.1. As for p9, the left graph includes two components: one comprises a sole node “Stat” and another is a unique node “receive”. The total number of level-1 contexts pertaining to p9 is 15, which is the product of the numbers of its two constituents. Some selected level-1 contexts of p9 are shown in Fig. 5.2, where (a)–(c) are those in which the portion of contexts of “Stat” are fixed and that of “receive” are variable, and (d)–(h) are the ones in which the portion of contexts of “Stat” are variable and that of “receive” are fixed. In each graph, the dashed subgraph that is composed of the two nodes “Stat” and “receive” is a redex of the left graph, and the rest is the context. The level-1 contexts of p8 are similar to that of p9. Then, we exhibit the process of context tailoring with (p8, p9). First, it reduces the contexts to depth-1 ones, that is, only those nodes that are directly connected to the left graph (via incident bold red edges) are preserved. No changes can be made in the second step, since the two right graphs themselves are the common subgraph. Third, the relation > between the two right graphs is p8. R < p9. R , and then no changes are made, either, according to the strategy for this case. In the fourth step, it deletes the portion of contexts that belongs to the component of p8.L, i.e., the node “Stat”, since it is the common component of p8.L and p9.L. The output contexts are shown in (a)-(c) of Fig. 5.3, where in each graph, the subgraph enclosed by the dashed green eclipse is a redex of p9.L. The fifth step deletes the two contexts depicted in (a) and (b), as they are included in the context in (c). The left graph of p9 equipped with the output of the context tailoring process is illustrated in (d). Evidently, this process produces similar output for p8. Sequentially, a pair of context-equipped productions for (p8, p9) is achieved, as shown in (e)-(f) of Fig. 5.3, according to the algorithm. The newly created two productions, denoted as p8′ and p9′, are compatible in that all the three mergers of their right graphs, depicted in (g)-(i), are reducible with respect to them, i.e., these mergers can be reduced with the productions in any of the orders and the results are the same. In each merger, the red subgraph that consists of the red nodes and the red 256
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
1:T send 2:B
1:T send 2:B
T Stat B
1:T 5 send 2:B
T Stat B
T Stat B
3:T 5 receive 4:B
3:T receive 4:B
(b)
1:T Stat 2:B
1:T 5 send 2:B 6:T 8 receive 7:B
3:T 5 receive 4:B
3:T 5 receive 4:B
(d)
1:T 5 send 2:B 6:T 8 send 7:B
:=
3:T receive 4:B
(f)
T send B
R9' T send B
T send B
T send B
T receive B
T receive B
T receive B
T receive B
R9' T send B
T receive B
R8'
(g)
6:T 8 receive 7:B
:=
3:T Stat 4:B
(e)
R8'
3:T 5 receive 4:B
(c)
1:T send 2:B 6:T 8 send 7:B
1:T Stat 2:B
3:T receive 4:B
(a)
1:T 5 send 2:B
T send B
T send B
T receive B
T receive B R9
R8
(h)
(i)
Fig. 5.3. The intermediate outputs of the while-loop in Algorithm 5.1 with (p8, p9) as input. (a)–(c) are the output contexts of the fourth step of the context tailoring process, (d) is p9.L equipped with the output context of the process, (e)–(f) are a pair of context-equipped productions (p8′, p9′), and (g)–(i) are all the mergers of the right graphs of p8′ and p9′.
instances that correspond to a certain context can be exactly simulated (unnecessary outermost surrounding subgraphs are omitted) by this context and its derivative contexts. Nevertheless, it seems inefficient to enumerate all the contexts with their derivatives and match the situation with each of them individually, because a production may possess a large even infinite number of contexts and the situation, i.e., the main part of a host graph, may be quite complex. Consequently, a tradeoff between accuracy and efficiency is necessary for the approach to be feasible. So, we stipulate that the situation involved is comparatively simple, and the number of contexts is sufficiently small. A simple choice of the situation that meets the former constraint is the subgraph of any host graph that merely contains the nodes directly linked to the redex and the edges between them. This graph can be effortlessly located and attained from the host graph, since it immediately surrounds the redex. Accordingly, the number of contexts that need to be examined will decrease considerably. Moreover, in order to satisfy the latter constraint, we can take a few more measures, to further reduce the number or complexity of involved contexts:
5.3. Strategy for improving parsing efficiency A general parsing algorithm is always a necessity for the reserved graph grammar formalism since those grammars that cannot be transformed into locally confluent ones by Algorithm 5.1 may exist in practice, even though they have not been encountered yet. In general, the practicability of general parsing algorithms is seriously impaired by their high time complexities, such as [6,15]. Apparently, backtracking is the main cause of high time complexity of a general parsing algorithm, due to its blind trial of reductions. During the parsing of a host graph, some unexpected (false positive) redexes are found and the corresponding reductions are conducted accordingly, and consequently a final graph is obtained that is not the initial graph of the involved graph grammar and to which no more reductions can be conducted. This gives rise to backtracking in the process of parsing. Therefore, identifying false positive redexes so as to avoid unexpected reductions is a direct and effective way to improve parsing performance. As for the RGG formalism, a subgraph will be found as a redex of a production's right graph in a host graph only when it is isomorphic to the right graph and the associated constraint on unmarked vertices is satisfied, regardless of the circumstance in which the subgraph lies. The circumstance, also called the situation, refers to the rest of the host graph minus the redex. A redex is called a true positive redex if the situation is a context instance of the production to which it corresponds, or a false positive one, otherwise. That is, it is the situation that determines whether a redex is false positive or not. Naturally, an approach to identifying a false positive redex can be implemented by verifying that the situation cannot match any one of the context instances of the corresponding production. The context
• Context matching of a production whose right graph consists of at
• 257
least two components can be decomposed into a few subtasks in terms of the components such that each fulfills the matching of a context portion (note that contexts of a production are the combinations of context portions of its components). The subtasks are conducted sequentially, that is, if a subtask fails, the matching process immediately terminates with failure and the subsequent subtasks will not be executed any longer. This completely avoids repeated matching of context portions of a production. Contexts of a production can be combined into fewer abstract context patterns. Informally, a context pattern is a context augmented
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al. 1:T fork 2:B T Stat B
1:T fork 2:B T Stat B
T Stat B
:=
3:T join 4:B
1:T fork 2:B T Stat B
T Stat B
T Stat B
3:T join 4:B
1:T fork 2:B Wildcard
3:T join 4:B
T Stat B
Wildcard
(b)
T assign B
6:T endif B
T Stat B
3:T join 4:B
(a) T if 5:B
T Stat B
:=
T Stat B
T fork 5:B
T if 5:B
T assign B
T Stat 5:B
T fork 5:B
T Stat B
6:T join B
6:T Stat B
T Stat B
6:T Stat B
6:T Stat B
T Stat 5:B
T Stat B
T Stat 5:B
T if 5:B
T if 5:B
T assign 5:B
T assign B
T assign 5:B
T assign 5:B
6:T endif B
T assign B
6:T join B
6:T assign B
6:T join B
6:T endif B
T assign B
6:T Stat B
6:T join B
T Stat 5:B
T fork 5:B
T fork 5:B
T assign 5:B
T 5 receive B
T Stat B
6:T assign B
6:T endif B
6:T assign B
6:T assign B
T Stat B
T 5 send B
T 5 send B
T send B
T 5 receive B
T 6 send B
T 6 receive B
T receive B
T 5 send B
T 5 receive B
T 5 receive B T 6 send B
T 5 send B T Stat B
T Stat B
T 5 send B
T 5 receive B
T receive B
T if 5:B
T if 5:B
T 6 receive B
T send B
6:T send B
6:T send B
(c) T fork B
T Stat B
T Stat B T join B
(d)
T fork B
T fork B T Stat B
T Stat B
T Stat B
T Stat B
T join B
T Stat B T join B
(e)
T Stat B
T Stat B
T Stat B
T join B
(f)
Fig. 5.4. Some immediate outputs of the while loop in Algorithm 5.1 with the pair (p5, p6) as the input. (a) is the intermediate result of p6′ prior to the generation of derivative contexts, (b) is the production p6′, (c) lists some elements of the wildcard in p6′, and (d)-(f) are some selected mergers of their right graphs of the pair (p5, p6′).
with a set of variables or wildcards and a set of constraints on the numbers or labels of some designated nodes. The variables and wildcards act as parameters in the expressions of the constraints. Readily, a context pattern is a representative of a number of contexts. Consequently, the process of matching a set of contexts can be implemented by merely matching the representative.
including the partial precedence relation between two productions. It seems that the partial precedence relation is similar to the opposite of the consequence relation, one kind of a set of dependence relations introduced in the LGG formalism [15,30]. Actually, however, they differ greatly from each other in several aspects. First, the former is defined on a set of productions, while the latter is defined on some version of a host graph. Second, the former is a relation between two productions, whereas the latter is a relation between one production instance and another appearing in a host graph. Third, the former associates a production’ right graph with another production's left graph on condition that a component of the right graph contains a redex of another component of the left graph, while the latter is a constraint that states the intersection of the right graph or the common part of a production instance and the right graph of another is not empty, where a production instance is a redex of the whole production (includes both the left and right graphs) in a completion of a host graph. Fourth, the former is static whereas the latter is dynamic. That is, the partial precedence relation on a set of productions is fixed, but the consequence relation with respect to a set of productions is different from one host graph to another. Fifth, the former is created as one of the underlying
Prior to parsing host graphs in terms of a graph grammar, the above measures should be taken to preprocess its contexts. Once this is done, the output, i.e., a set of refined context patterns, can be used in parsing any time afterwards. This approach is practicable, since the parameters that feature a graph grammar are usually small constants. 6. Related work To the best of our knowledge, few articles have to date specialized in the research of contexts for context-sensitive graph grammars. Some work related to the notion of contexts and their applications are discussed below. The notion of context is established on a few elementary concepts, 258
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al.
notions for the formalization of the concept of “context”, while the latter is merely one kind of the dependence relations introduced in construction of the top-down phase of the parsing algorithm of LGG. The membership problem, an essential issue of graph grammars, is frequently tackled by the way of graph parsing. Backtracking is commonly used in the design of parsing algorithms of graph grammars, such as AGG [31], CLGG [20] and RGG+ [29]. AGG, having a solid theoretical foundation [32,33], is both a rule-based visual language and a development environment that offer an algebraic approach to typed attributed graph transformation [34–36]. As an extended formalism of LGG, Contextual Layered Graph Grammar (CLGG) provides three additional mechanisms: node embeddings, negative application conditions and complex predicates, which enable designers to specify a much wider range of visual languages in a compact and intuitive manner. As simple backtracking always has exponential time complexity, both AGG and CLGG introduce critical pair analysis to improve the performance of parsing algorithms. Critical pair analysis originates from the theory of term rewriting, where it is usually exploited as a means to check whether a term rewriting system possesses a functional behavior, i.e., if it is confluent. This mechanism is naturally generalized to graph rewriting, such as hypergraph [37] and term graph [38] transformation systems. A critical pair is a pair of transformations from a common graph such that one’ application disables another and vice versa, and the graph is minimal according to the rules applied. In graph transformation systems, critical pair analysis is frequently used to extract the graph elements that cause conflicts, whereas it is exploited in graph parsing for graph grammars in a slightly different way. Prior to graph parsing, critical pair analysis is conducted prior to runtime to find the productions in conflict and the associated conflicting situations. In parsing process, non-conflicting productions are applied first to reduce the host graph as much as possible, and then come the conflicting ones. In each conflicting situation, a decision point is created for subsequent backtracking. Although this optimization strategy actually improves parsing efficiency, it does not reduce the time complexity of parsing algorithms in the worst case [20,31]. Besides, a CS-NCE (context-sensitive with neighborhood controlled embedding) graph grammar [39], which is an extension of context-free NCE graph grammar [40], establishes a sufficient condition by performing a static analysis of productions to ensure its parsability [41]. Different from the above-mentioned techniques, our approach directly employs the contexts of productions to exclude unnecessary reductions as many as possible so as to improve parsing efficiency for implicit context-sensitive graph grammars. This approach is an application of contexts to parsing algorithms. Moreover, critical pair analysis also underlies another common algorithm, which decides whether a graph grammar is locally confluent for context-sensitive formalisms [16,21]. In RGG, the efficient parsing algorithm SFPA with polynomial time complexity requires that the graph grammars must be local confluent. Based on the RGG formalism, Kong et al. proposed a spatial graph grammar (SGG) framework that is the integration of both the spatial and structural specification mechanisms [16]. This formalism is equipped with a parser that is more efficient than that of its predecessor, the SFPA, with the aid of spatial information. As the parser is created on the basis of the SFPA, it inherits the precondition of the latter, i.e., any applicable graph grammars must be locally confluent. But, this requirement is not frequently satisfied in practical applications. To address this challenge, another extended version of RGG was proposed in [42], which assigns attributes to graph elements and allows constraints with attributes as parameters to be imposed to productions. Accordingly, an algorithm was rendered to transform a non-confluent set of productions into a locally confluent graph grammar. However, the algorithm is semi-automatic in that interactions with graph grammar users are required in its execution. As another application of contexts, an alternative approach to tackle local confluence is given in the paper. It transforms a non-confluent set
of productions into a locally confluent one by substituting refactored productions for those that cause non-confluence on condition that the language has not been changed after the replacement. By embedding necessary elements of contexts to it and renumbering the marked vertices, an inappropriate production can be refactored. This approach demands no intervention of humans and thus is automatic. 7. Conclusion Contexts of productions are essential information for context-sensitive graph grammars, especially for implicit ones. On the basis of RGG, a representative of implicit context-sensitive graph grammar formalisms, this paper has introduced the notions of partial and total precedence relations, and then developed a theory of contexts based on these two notions, which comprises the formalization of contexts, their characterizations, and the connection between contexts and their instances. The theory of contexts can be naturally adapted to other implicit context-sensitive graph grammar formalisms by replacing the embedding mechanism of RGG with that of others, which does not need much effort. Moreover, this paper has illustrated three applications of contexts. Contexts can be used to facilitate comprehension of graph grammars, which makes up the deficiency in intuitiveness of implicit graph grammars caused by incompletely or implicitly represented context elements in the productions. Contexts can also be employed to construct locally confluent set of productions from originally non-confluent set. This broadens the application scope of the efficient parsing algorithm of the RGG formalism. Nevertheless, under what condition can this transformation be achieved is still unsolved. Furthermore, similar to the mechanism of critical pair analysis, contexts can be utilized to reduce backtracking so as to improve the parsing performance for general parsing algorithms. In our future work, further investigation will be made to explore the application scope of contexts, and a support system for context computation and visualization will be developed as well. Acknowledgments The authors are very grateful to the anonymous reviewers for their valuable advice and suggestions, which are conducive to improving the manuscript of the paper. This work is supported in part by the National 973 Program of China under grant 2015CB352202, and the National Science Foundation of China under grants 61170089, 91318301, and 61321491. References [1] S.K. Chang, Visual languages: a tutorial and survey, IEEE Softw. 4 (1) (1987) 29–39. [2] G. Rozenberg (Ed.), Handbook of Graph Grammars and Computing by Graph Transformation, 1, Foundations, World Scientific, 1997. [3] T.R. Dean, J.R. Cordy, A syntactic theory of software architecture, IEEE Trans. Softw. Eng. 21 (4) (1995) 302–313. [4] D.L. Métayer, Describing software architecture styles using graph grammars, IEEE Trans. Softw. Eng. 24 (7) (1998) 521–533. [5] C. Ghezzi, A. Mocci, M. Monga, Synthesizing intensional behavior models by graph transformation, Proc. IEEE International Conference on Software Engineering, 2009, pp. 430–440. [6] L. Chen, L. Huang, C. Li, L. Wu, W. Luo, Design and safety analysis for system architecture: a breeze/ADL-based approach, Proc. IEEE Annual International Conference on Computers, Software and Applications, 2014, pp. 261–266. [7] S.S. Chok, K. Marriott, Automatic generation of intelligent diagram editors, ACM Trans. Comput.-Hum. Interact. 10 (3) (2003) 244–276. [8] D. Blosetin, A. Schürr, Computing with graphs and graph transformation, Software 29 (3) (1999) 197–217. [9] L. Lin, T. Wu, J. Porway, Z. Xu, A Stochastic graph grammar for compositional object representation and recognition, Pattern Recognit. 42 (7) (2009) 1297–1307. [10] L. Lin, H. Gong, L. Li, L. Wang, Semantic event representation and recognition using syntactic attribute graph grammar, Pattern Recognit. Lett. 30 (2) (2009) 180–186. [11] H. Ehrig, G. Engels, H.J. Kreowski, G. Rozenberg (Eds.), Handbook on graph grammars and computing by graph transformation, 2, Applications, Languages and Tools, World Scientific, 1999.
259
Journal of Computer Languages 51 (2019) 241–260
Y. Zou, et al. [12] H. Ehrig, H.J. Kreowski, U. Montanari, G. Rozenberg (Eds.), Handbook of graph grammars and computing by graph transformation, 3, Concurrency, Parallelism, and Distribution, World Scientific, 1999. [13] J. Kong, C. Zhao, Visual language techniques for software development, J. Softw. 19 (8) (2008) 1902–1919. [14] D. Blostein, H. Fahmy, A. Grbavec, Practical Use of Graph Rewriting, (1995), pp. 95–373. Technical Report. [15] J. Rekers, A. Schürr, Defining and parsing visual languages with layered graph grammars, J. Vis. Lang. Comput. 8 (1) (1997) 27–55. [16] D. Zhang, K. Zhang, J. Cao, A context-sensitive graph grammar formalism for the specification of visual languages, Comput. J. 44 (3) (2001) 187–200. [17] X. Zeng, X. Han, Y. Zou, An edge-based context-sensitive graph grammar formalism, J. Softw. 19 (8) (2008) 1893–1901 (in Chinese). [18] Y. Zou, X. Zeng, X. Han, K. Zhang, Context-attributed graph grammar framework for specifying visual languages, J. Southeast Univ. 24 (4) (2008) 455–461. [19] P. Bottoni, G. Taentzer, A. Schürr, Efficient parsing of visual languages based on critical pair analysis and contextual layered graph transformation, Proc. IEEE Symposium on Visual Languages, 2000, pp. 59–60. [20] M. Qiu, G. Song, J. Kong, K. Zhang, Spatial graph grammars for web information transformation, Proc. IEEE Symposium on Visual/Multimedia Languages, IEEE CS Press, Auckland, New Zealand, 2003, pp. 84–91. [21] J. Kong, K. Zhang, X. Zeng, Spatial graph grammars for graphical user interfaces, ACM Trans. Comput.-Hum. Interact. 13 (2) (2006) 268–307. [22] L. Chen, L. Huang, L. Chen, Breeze graph grammar: a graph grammar approach for modeling the software architecture of big data-oriented software systems, Softw. 45 (8) (2015) 1023–1050. [23] G. Song, K. Zhang, J. Kong, Model management through graph transformations, Proc. IEEE International Symposium on Visual Languages and Human-Centric Computing, 2004, pp. 75–82. [24] K. Zhang, J. Kong, M. Qiu, G. Song, Multimedia layout adaptation through grammatical specifications, ACM/Springer Multimed. Syst. 10 (3) (2005) 245–260. [25] M. Qiu, G. Song, J. Kong, K. Zhang, Spatial graph grammars for WEB information transformation, Proc. IEEE Symposium on Visual/Multimedia Languages, 2003, pp. 84–91. [26] C. Zhao, J. Kong, J. Dong, K. Zhang, Pattern based design evolution using graph transformation, J. Vis. Lang. Comput. 18 (4) (2007) 378–398. [27] C. Zhao, J. Kong, K. Zhang, Program behavior discovery and verification: a graph grammar approach, IEEE Trans. Softw. Eng. 36 (3) (2010) 431–448. [28] J. Kong, O. Barkol, R. Bergman, S. Schein, C. Zhao, K. Zhang, Web interface adaptation using graph grammars, IEEE Trans. Syst. Man Cybern. 42 (4) (2012) 590–602. [29] X. Zeng, K. Zhang, J. Kong, G. Song, RGG+: an enhancement to the reserved graph
[30] [31] [32]
[33]
[34] [35] [36] [37] [38] [39] [40] [41] [42]
260
grammar formalism, Proc. IEEE Symposium on Visual Languages and HumanCentric Computing, 2005, pp. 272–274. J. Rekers, A. Schürr, A graph grammar approach to graphical parsing, Proc. IEEE Symp. Vis. Lang. (1995) 195–202. G. Taentzer, AGG: a graph transformation environment for modeling and validation of software, in: J.L. Pfaltz, M. Nagl, B. Bohlen (Eds.), Applications of Graph Transformations With Industrial Relevance, 3062 LNCS, 2004, pp. 446–453. A. Corradini, U. Montanari, F. Rossi, H. Ehrig, R. Heckel, M. Lowe, Algebraic approaches to graph transformation, part I: basic concepts and double pushout approach, in: G. Rozenberg (Ed.), Handbook of Graph Grammars and Computing by Graph Transformation, 1 Foundations, World Scientific, 1997, pp. 163–246. H. Ehrig, R. Heckel, M. Korff, M. Lowe, L. Ribeiro, A. Wagner, A. Corradini, Algebraic approaches to graph transformation, part II: single pushout approach and comparison with double pushout approach, in: G. Rozenberg (Ed.), Handbook of Graph Grammars and Computing by Graph Transformations, 1 Foundations, World Scientific, 1997, pp. 247–312. A. Corradini, U. Montanari, F. Rossi, Graph processes, Special Issue Fundam. Inf. 26 (3–4) (1996) 241–266. M. Lowe, M. Korff, A. Wagner, An algebraic framework for the transformation of attributed graphs, in: M.R. Sleep, M.J. Plasmeijer, M.C. van Eekelen (Eds.), Term Graph Rewriting: Theory and Practice, John Wiley & Sons Ltd, 1993, pp. 185–199. R. Heckel, J.M. Küster, G. Taentzer, Confluence of typed attributed graph transformation systems, Proc. the International Conference on Graph Transformation, 2002, pp. 161–176. D. Plump, Hypergraph Rewriting: Critical Pairs and Undecidability of Confluence, in: M.R. Sleep, M.J. Plasmeijer, M.C.J.D. van Eekelen (Eds.), Term Graph Rewriting, 1993, pp. 201–214. D. Plump, Term graph rewriting, in: G. Engels, H.J. Kreowski, G. Rozenberg (Eds.), Handbook of Graph Grammars and Computing by Graph Transformation, 2, Applications, Languages, and Tools, World Scientific, 1999, pp. 3–62. Y. Adachi, S. Kobayashi, K. Tsuchida, T. Yaku, An NCE context-sensitive graph grammar for visual design languages, Proc. IEEE Symposium on Visual Languages, 1999, pp. 228–235. J. Engelfriet, G. Rozenberg, Node replacement graph grammar, in: G. Rozenberg (Ed.), Handbook of Graph Grammars and Computing by Graph Transformation, 1 Foundations, World Scientific, 1997, pp. 1–94. Y. Adachi, Y. Nakajima, A context-sensitive NCE graph grammar and its parsability, Proc. IEEE Symposium on Visual Languages, 2000, pp. 111–118. Y. Zou, J. Lü, X. Zeng, X. Ma, Q. Yang, Constructing confluent context-sensitive graph grammars from non-confluent productions for parsing efficiency, in: M. Huang, Q.V. Nguyen, K. Zhang (Eds.), Visual Information Communication, Springer, New York, 2009, pp. 135–147.