ARTIFICIAL INTELLIGENCE
219
Processing of Semantic Nets on Dataflow Architectures Lubomir Bic Department
of Information
and Computer
Science,
University
of California, Irvine, CA 92717, U.S.A.
Recommended by Pat Hayes
ABSTRACT Extracting knowledge from a semantic network may be viewed as" a process o f finding given patterns in the network. On avon Neumann computer architecture the semantic net is a passive data structure stored in memory and manipulated by a program. This paper demonstrates" that by adopting a data-driven model of computation the necessary pattern-matching process may be carried out on a highly-parallel dataflow architecture. The model is' based on the idea o f representing the semantic network as a dataflow graph in which each node is an active element capable of accepting, processing, and emitting data tokens traveling asynchronously along the network arcs. These tokens are used to perJorm a parallel search for the given patterns. Since no centralized control is required to guide and supervise the token flow, the model is" capable of exploiting a computer architecture consisting o f large numbers o f independent processing elements.
1. Introduction
Most AI researchers would probably argue that there are no 'architectural' solutions to 'AI problems'. While sympathetic with this point of view, finding architectures which would significantly speed up the execution of a given application seems an effort worth pursuing. The importance of parallelism has been recognized and a number of AI applications have been implemented on multiprocessor architectures, e.g. Z M O B [5]. Unfortunately, such architectures are based on the yon Neumann model of computation and hence the inherent difficulty of dividing computation into independent subtasks and the supervision and synchronization of their execution is the main limitation to massive parallelism. In this paper we focus on a particular d o m a i n - - t h e processing of semantic networks--and show that by adopting a data-driven view of processing, many problems faced by conventional (von Neumann model based) systems are eliminated. As a result, computer architectures consisting of large numbers of Artificial Intelligence 27 (1985) 219-227 0004-3702/85/$3.30 © 1985, Elsevier Science Publishers B.V. (North-Holland)
220
L. BE('
processing elements, in particular those designed to execute dataflow programs, may usefully be exploited. 2. Viewing Semantic Nets as Dataflow Graphs
At the architectural level, a semantic net is a collection of nodes interconnected via directed labeled arcs. When implemented on a conventional yon Neumann architecture, this graph is a passive data structure maintained in primary memory (real or virtual) and manipulated by an outside agent--a program. In our model we adopt a different point of view: The semantic network, rather than being a passive representation of knowledge, is a dataflow graph [2, 6]. That is, each node is an active element capable of accepting, processing, and emitting value tokens (messages) traveling asynchronously along the network arcs. Thus the computational power is distributed throughout the network. As will be discussed below, this principle has important ramifications for processing of such 'active' networks. It has been suggested by several semantic-net based systems, in particular SNIFFER [4], that extracting information from the net may be viewed as a special process of pattern matching: each 'query' is interpreted as a graph template for which the system tries to find matching structures in the semantic net. In a dataflow graph the finding of a given pattern may be carried out by propagation of tokens; the pattern is placed on one or more tokens and injected into selected nodes of the graph. Each token is replicated concurrently from its injection point into many directions along the network arcs in the search of possible matches for the given pattern. (The above principle of extracting information from the net shows some similarity to the markerpropagation scheme proposed in N E T L [3]. The main distinction, however~ is the need for centralized control in the latter, which imposes significantly different constraints and requirements on the underlying computer architecture.) Before specifying the procedures for injecting and guiding tokens through the network, we describe the underlying pattern-matching problem more formally: - T h e semantic net is a dataflow graph consisting of edges t1 0
p
t2 '0
where t~ and t2 are constants, representing nodes, and p is the label of the arc connecting these two nodes. Each node tl is capable of receiving, processing, and emitting tokens traveling along arcs. Fig. 1 shows a small portion of a semantic network representing the relationship 'employment" between elements of the two sets of 'department's and 'employee's. All nodes in this network represent constant values, as opposed to variables.
221
P R O C E S S I N G OF S E M A N T I C NETS
dept
empl'mnt
empl°yei~
agent
object
dl d2
/
etl
~- ~
object
etz
8 epl
ep2
FrG. 1.
- A query is a graph consisting of edges of the form
T1 0
p
T2 0
where T~ and T2 could be constants or variables, and p is the corresponding arc label. To answer a given query, the system will try to bind a constant to each variable of the query such that the query matches some portion of the semantic net. In this sense, the query graph is interpreted as a graph template to be 'fitted' into the semantic net. For example, Fig. 2 shows a graph template where the nodes X and Y represent variables.1 It can be interpreted as finding employees (together with the corresponding instances of the 'employment" relationship) of the department dl: By instantiating X and Y to the pairs etl, epl, or et2, ep2, respectively, the query matches different portions of the semantic network of Fig. 1. We will make the following assumptions about the query template: (1) It forms a connected graph. While it is possible to envision queries consisting of several disconnected components, each of these may be treated as a separate graph template and fitted into the semantic net independently. Hence this assumption can be made without any loss of generality. (2) At least one of the nodes of the template corresponds to a constant value. This assumption is justifiable on pragmatic grounds since queries containing only variables are too general to be of any practical use; they correspond to finding sets or set elements related via unspecified relationships to other unspecified sets. 2 r Throughout this paper, lower case letters are used to denote constants while capitals are used to denote variables. 2 The model can easily be extended to cope with queries containing only variables; since such an extension does not increase the usefulness of the model in any significant way, it will not be discussed further.
222
:.. BIC
agent mnteI em objectempoyee dl
X
Y
FIG. 2.
3. Transformation of Templates to Simplify Fitting Procedures As mentioned earlier, the fitting of a graph template into the semantic net is to be performed by tokens propagating asynchronously along the edges of the dataflow graph representing the semantic net. Due to the asynchronous nature of dataflow systems, the detection of cycles is difficult. The problem is greatly simplified if the given template is first transformed into a non-cyclic form. Two approaches may be taken to accomplish this--the template may be transformed into a tree structure or into a linear chain. The latter is defined as - a connected chain of edges without cycles or branches, where - t h e leftmost node TI is a constant while the remaining nodes T 2 through I , may be constants or variables. Graphically, we can depict a linear chain as follows: 11 0
Pl
T2 0
P2
Pn ~
Tn 0
While the approach of transforming a given template into a general tree structure yields more potential parallelism, we chose to describe only the simpler case of linear chains to illustrate the basic principles. The transformation in this case is based on the idea of finding an Euler path through the given graph template, i.e. a path which traverses all arcs exactly once. The following steps illustrate how this is accomplished: (1) Transform each cluster into an Euler graph, that is: (i) Examine each node and determine the number of arcs emanating from that node. This number is termed the local degree of a node. (ii) If there is a node with an odd local degree, replicate one or more of the edges connected to that node to make the degree even. (Nodes with an odd local degree always occur in pairs and hence it is possible to eliminate them by replicating the paths connecting each pair.) If there are several choices of paths that could be replicated, favor those with the smallest number of free variables in order to minimize the number of possible bindings. Repeat this step until at most two nodes (one of which must be a constant) have odd local degree.
223
PROCESSING OF SEMANTIC NETS
To illustrate this step, assume that the graph template of Fig. 2 is to be fitted into the semantic net, a portion of which is shown in Fig. 1. By replicating the edge connecting the nodes 'employment' and 'X', as shown by the dotted line in Fig. 3, the number of odd-degree nodes has been reduced to two; these are the nodes 'dept' and 'employee'. (2) Starting with one of the odd-degree nodes representing a constant, find an Euler path through the graph template. Simple algorithms for finding such a path are discussed in most books on graph theory. (In the case where all nodes have an even local degree, any node representing a constant may be chosen as the starting point.) In the above example, selecting the node 'dept' as the starting point results in the transformation of the graph template of Fig. 3 into the linear form shown in Fig. 4. (3) Due to the possibility of multiple occurrences of variables in a linear graph template a final modification is performed: the template is scanned from left to right and all repeating occurrences of variables are linked together via pointer chains. These will be used during execution to bind all occurrences of the same variable to the same constant. In Fig. 4 the variable X occurs twice; the two instances are linked via pointers, resulting in the final form of the linear template shown in Fig. 5. It can be shown that the above algorithm is guaranteed to produce a linear graph template comprising all edges and nodes of the original nonlinear template, some of which may be in duplicate. We now devote our attention to describing how patterns corresponding to such a linear template are found in the underlying semantic networks using tokens.
ePtIe agentempmntFe eobectem°yeI dl
X
Y
FIG. 3
dept 0
Fro. 4.
e
dl 0
agent
X 0
e
empl'mnt 0
e
Y 0
object
Y 0
e employee 0
t_ BIC
224 dept
°
e
dl
agent
X
°
e empl'mnt
I',,
e
X
°
object
Y
•
o
e employee
o
FI(;. 5,
4. Graph Fitting Using Token Propagation The processing of a linear graph template T1 0
Pl
T2 0
P2
Pn
1
"In ' 0
as defined in Section 3, takes place as follows: The template is placed on a token a copy of which is injected into all nodes of the semantic net that match the leftmost node T~ of the template. From each of these nodes the token is replicated along all edges that match the first template edge pl. Nodes receiving this token will compare their own contents to the second template node T2 and, if the match is successful, perform the same operation; that is, replicate the token along all edges matching the next template edge P2, etc. By this stepwise expansion of the template into many directions of the network, all possible matches are found without the need for any centralized control. The above actions are formalized as the following procedure which specifies the individual steps performed by a node ti upon receiving a token carrying the following linear graph template T, 0
Pi
T, { I 0
Pi + l
P.
1
T,, 0
(1) The node ti examines the first node 7) of the cluster carried by the received token. If Ti is a constant different from ti then the node returns a token to the sender, indicating a failure. If, on the other hand, T~ is a constant that matches ti, or if T~ is a variable, then the node detaches Ti (including the first arc Pi) from the graph template, and replicates the modified token T# l l 0
pl F1
P~1 1 . . . . . . . . . . . .
Tn
0
along all edges labeled Pi. In the case where I) is a variable, the node must also bind its own value ti to all other occurrences of T, within the template (using the pointers created during step (3) of the token construction procedure of Section 3) prior to replicating the token. (2) The node ts then awaits response tokens from all directions into which a token has been sent. Each response token carries the bindings of constants to variables made during the forward propagation of the corresponding token; it represents one successful match, or indicates failure. All tokens received by the
PROCESSING OF SEMANTIC NETS
225
node ti are returned to its sender. Thus answers propagate backward along the same paths taken by tokens during their forward propagation. Computation in each node terminates when responses to all tokens emitted previously have been received. To illustrate the above procedure, consider again the linear template of Fig. 5. It is placed on a token and injected into the node 'dept' of the semantic network of Fig. 1. From there copies of the token are replicated along all edges labeled 'e', thus arriving at the nodes 'dl' and 'd2'. Since the node 'd2' of the network does not match the corresponding node 'dl' of the template, 'd2' discards the token and instead returns a failure token to its sender--the node ~dept'. The match in the other network node, 'dl', is successful and hence copies of the token are propagated along all edges labeled 'agent'. These are received by the two nodes ' e t l ' and 'et2'. The corresponding node in the template is the variable 'X', which implies a successful match in both cases. However, since there is another occurrence of the same variable 'X' in the template, (indicated by the pointer), it must be bound to the same constant as the first occurrence. Hence the node ' e t l ' binds (replaces) the variable 'X' with the constant ' e t l ' before forwarding the token along the next edge 'e' to the node 'employment'. Similarly, the node 'et2' binds the variable to 'et2'. Using the same basic procedure, both tokens continue their journey, traversing once more the nodes ' e t l ' and 'et2' respectively, until the nodes ' e p l ' and 'ep2' are reached. Since the corresponding template node is a variable, 'Y', the match will succeed in both cases. This indicates that two independent solutions for fitting the original template into the network have been found. 5. Dataflow Architectures
The main objective of the approach described in this paper was to develop a model which would be suitable for processing on a highly-parallel computer architecture. We have therefore rejected the yon Neumann model of computation in which a semantic network is a stored data structure manipulated by a program. Instead, the network is viewed as a dataflow graph, where procedures for token manipulation and communication are part of each node. It should be emphasized that each node of the network contains the same set of procedures, which are triggered solely by the arrival of tokens at that node. Hence computation is strictly data-driven--there is no need for any centralized control to synchronize the operation of individual nodes. Each token, once injected, contains sufficient information to be guided through the network independently of other tokens. This is in contrast to NETL, where sets of marker bits are pushed through the network in lock step under the supervision of a central controller; the technological difficulties resulting from such an approach have been pointed out in [3]. The architectural requirements of the data-driven approach, on the other hand, may be satisfied by a variety of different architectures; these requirements are as follows:
226
L
BK
(1) A mapping function f must be provided which assigns each node of the semantic network to one of the PEs. This PE then receives and processes tokens for all nodes mapped onto that PE. In the simplest case, a hashing function may be used as in the data-driven database machine described in [1 ]. (2) A communication network must be provided which permits nodes residing in different PEs to communicate with one another along the (logical) arcs This implies that, in general, each PE must be able to communicate--directly or via other PEs--with any other PE. A number of possible interconnection schemes satisfying this requirement exist; the choice is largely a cost/ performance tradeoff. (3) It must be possible to inject tokens into any node of the semantic net. At the architecture level, this implies finding the address of the PE holding the receiving node, This is very similar to the second requirement above, which is the ability to provide communication channels (logical arcs) between nodes mapped onto different PEs. Hence the same mechanisms may be used: the sender, be it an internal node or an external input source, can apply the same mapping function f (used initially to map the net onto the architecture) to find the receiver's PE. The PE number is carried by the token as its destination address when it is being routed through the architecture. In a similar manner, results extracted from the semantic net may be routed to PEs connected to an output device. From the above discussion it follows that a number of existing architectures, in particular those designed to execute dataflow programs, could easily be adapted to effectively support the processing of semantic nets according to the proposed model. In closing, let us contrast the two sources of parallelism available through the proposed approach: lntra-request parallelism. As described in Section 4. a token injected into a node of the dataflow graph is replicated into many directions in the search of a match. Each node receiving a copy of the token may process and forward it concurrently with other nodes. The theoretical time complexity for finding all matches for a given graph template is proportional to the number of edges constituting that template. Inter-request parallelism. Assuming a significant number of PEs constitute the underlying computer architecture, only a small subset of these will, in general, be busy processing a given query. This implies that the unused computational power of idle PEs may be exploited by executing more than one query at a time. A simple token coloring scheme is sufficient to distinguish tokens belonging to different graph templates and thus to permit their coexistence in the system. REFERENCES l. Bic, L. and Hartmann, R., Hither hundreds of processors in a database machine, in: Proceedings Fourth International Workshop on Database Machines (Springer, Berlin, 1985"},
PROCESSING OF SEMANTIC NETS
227
2. Computer 15 (2) (1982) Special Issue on Dataflow Systems. 3. Fahlmam S.E., NETL: A System for Representing and Using Real-World Knowledge (MIT Press, Cambridge, MA, 1979). 4. Fikes, R.E. and Hendrix, G.G., A network-based knowledge representation and its natural deduction system, in: Proceedings Fifth International Joint Conference on Artificial Intelligence, Cambridge, MA, 1977. 5. Rieger, C., Trigg, R. and Bane, B., ZMOB: A new computing engine for AI, in: Proceedings Seventh International Joint Conference on Artificial Intelligence, Vancouver. BC, 1981. 6. Treleaven, P.C., Brownbridge, T.R. and Hopkins, R.C., Data-driven and demand-driven computer architecture, ACM Computing Surveys 14 (1) (1982).
Received March 1984; revised version received February 1985