E LS EVI E R
Expert SystemswithApplications 14 (1998) 149-157
Expert Systems with Applications
Graphical and linguistic dialogue for intelligent multimodal systems Luis A. Pineda Intelligent Graphical Systems, Information Systems, Institute for Electrical Research (liE), Cuernavaca, Mkxico
Abstract
In this paper a brief outline of the elements of a theory of multimodal representation and interpretation is presented. The theory is illustrated with the help of an interactive system called GRAFLOG which is able to support a multimodal graphical and linguistic dialogue. In this dialogue graphical symbols with their interpretation are introduced incrementally, and the system is able to assist the user in the solution of simple design problems. The theory has been used as a framework for the development of a number of master thesis and experimental systems addressing different aspects of representation and problem-solving in the context of multimodal dialogues. This line of work is also summarized. At the end of the paper the kind of dialogue that is currently being considered for the next version of the system is illustrated. © 1998 Elsevier Science Ltd. All rights reserved
1. INTRODUCTION In this paper some elements of a theory for the definition and interpretation of graphical representations and its application to graphical reasoning and problem solving in the context of multimodal interactive systems are outlined. The discussion is focused on two specific modalities: graphics and natural language. One basic assumption is that graphical symbols and geometrical relations represent objects and conceptual relations, and that in interactive multimodal systems this representation relation is stated by convention. This conventional meaning can be stated either beforehand or through natural language expressions supported by pointing or spatial deictic acts. An important consequence of these assumptions is that the denotation of a graphical representation can be defined dynamically during interactive sessions, and can be altered during the interactive dialogue. An additional assumption is that graphical representations are always used through their meanings, being the geometrical interpretation one particular meaning always present in the interpretation process. It is also assumed that the interpretation of graphical symbols and geometrical relations is performed by human users by direct inspection, and the ability of human users to differentiate graphical individuals and geometrical relations has a system counterpart in terms of geometrical algorithms. If a graphical relation can be 'seen at all by the system' there must be a geometrical algorithm that
computes that relation. On the basis of these assumptions, a theory and an experimental system called GRAFLOG (Pineda, 1989, 1993; Klein & Pineda, 1990; Pineda & Lee, 1992) have been developed, and a number of systems based in similar considerations have been developed since, as shown below in this paper. The aim of this research effort is to develop a theory of graphical interpretation in which a mixed graphical and natural language dialogue can be applied to reasoning and problem solving through constraint satisfaction in graphical domains. 2. THE RELATION BETWEEN LINGUISTIC AND G E O M E T R I C A L REPRESENTATIONS The assumptions mentioned above have suggested a theory of graphical representation and interpretation which has two independent layers: a geometrical and a conceptual or linguistic one. Both of these layers are represented in a declarative language of a logical kind (in particular, multi-sorted logic has been used for the formalization (Goguen et al., 1978; Pineda & Lee, 1992)). In more recent work a formalization along the lines of Montague semantics has also been developed (Pineda & Garza, 1997). The geometrical representational layer is constituted by a set of terms that refer lo graphical symbols such as dots, lines and polygons, the position of a dot, etc., and Boolean terms that refer to geometrical relations between graphical objects, such as
0957-4174/98/$19.00Copyright© 1998ElsevierScienceLtd.All rights reserved. t>11S0957-4174(97)00056-0
150
L. A. Pineda /Expert Systems with Applications 14 (1998) 149-157
the relation between two parallel lines or whether a dot is inside a polygon. The conceptual representational layer is designedto state the semantic interpretation of the graphical symbols. In an architectural setting, for instance, a line might represent a wall of a house, and in a computer interface a line might represent the edge of an interface window. In the same way that graphical symbols represent objects, graphical relations represent conceptual relations: if two lines in a drawing represent water pipes, a joint relation between those lines might be taken to represent that water flows through the pipes. To capture the relation between conceptual and geometrical meaning, an explicit representation relation mapping graphics to its interpretation is stated. Objects (abstract or concrete) are represented through graphical symbols (simple or composite), and conceptual relations are represented by geometrical relations. The representation relation implies that there is a morphism between graphical and conceptual terms, but this is not a one-toone relation. Although a simple graphical symbol can represent an object, and a geometrical relation can be interpreted as a simple conceptual relation, as the examples mentioned above, a complex description denoting an individual can be represented by a basic graphical symbol and vice versa. In an architectural setting, for instance, the linguistic description 'the wall next to the main room' can be represented by a single line on the drawing, and the graphical description 'the extreme intersection between line-1 and line-2' (which denotes a dot on a drawing) can represent a comer of a room in the conceptual domain. In general, the representation relation has as its arguments a term of the graphical (or geometrical) representational language and a term of the linguistic representational language; the former denotes the representing object and the latter the represented one. Summarizing, to interpret a graphical symbol we need to evaluate a representation relation of the form represents(graphical_term,
linguistic_term)
and to know which graphical symbol is interpreted as a given individual, we need to employ an inverse relation of the form represents - J( linguistic_term, graphical_term).
In this scheme, linguistic predicates, like spatial prepositions, are interpreted in terms of functions defined in the geometrical domain. These functions, in turn, are computed through geometric algorithms. The preposition 'in', for instance, is interpreted in terms of a geometrical function as follows: represents -I(in,Axy.ing ...... ical(X,Y ) ) •
When a linguistic expression with a spatial reference,
like a prepositional phrase including a spatial preposition, is interpreted in the linguistic domain, both functor and arguments are interpreted in terms of inverse representation function, and the interpretation of the functor is applied to the interpretation of its arguments in a compositional fashion. The resulting object is an expression of the graphical representational language. When the expression 'the sink is in the kitchen' is interpreted, and considering the instances of the representation relation represents - l (sinkl,rectanglel) represents - l(kitchenl,rectangle2)
where sink~ and kitchen~ are proper names for the definite descriptions, the translation of functor and arguments results in, Axy.ingeo,~t~c~t(x, y)( rectangle l ,rectangle2).
The application of this function to its arguments produces in turn the expression ing .... ,a~at(rectanglel,rectangle2)
which is a well-formed expression of the geometrical representational language. The symbol ingeome,rica~in the expression is a logical constant of the geometrical representational language and its semantics is given by an algorithm. An important characteristic of the geometrical language is that there is a set of geometrical constants, and each constant in this set has an attached algorithm. These algorithms provide the 'graphical knowledge' that the system has. In the interpretation of composite geometrical terms the standard symbolic manipulation is interwoven with the evaluation of these algorithms. The interpretation scheme sketched here has been reported elsewhere (Pineda, 1989; Klein & Pineda, 1990; Pineda & Garza, 1997). The representation relation can also be used to obtain the interpretation of a graphical representation in a compositional fashion. One can interpret a geometrical relation such as 'the extreme intersection between line-1 and line-2' by applying the interpretation of the graphical functor to the interpretation of its graphical arguments, yielding, for instance, a linguistic description such as 'a comer of the room'. Not all linguistic constants and predicates have a representation in the graphical domain, and not all graphical constants are directly interpreted. Then, for a given application and interaction state, the reference of a linguistic or a graphical expression has to be worked out in terms of the actual instances of the representation relation and its inverse. An additional consideration for working out this scheme is that the
L. A. Pineda / Expert Systems with Applications 14 (1998) 149--157
same linguistic constant of a particular sort can have several geometrical interpretations (consider, for instance, the different geometrical arrangements of objects that can be referred to by the preposition 'in'), and also a graphical functor might represent several linguistic relations. The evaluation of the representation relation is then a non-deterministic process, and several ambiguities can arise during the interpretation. One aim of the scheme is to solve these ambiguities by taking into account the constraints that the two interpretation layers impose upon each other, and also other aspects of the dialogue as will be seen below. An important motivation for this approach has to do with reasoning with heterogeneous representations: questions can be put in the linguistic modality but not enough information might be available for producing an answer unless the information present in the graphics can be taken into account. Consider whether we have the graphical representation of a kitchen with the sink and pipes. Suppose that we state that the sink is working properly if the pipes are connected. Now, to answer whether the sink is working at a particular time of the interactive session, we have to see whether the lines representing the pipes stand in a joint relation. In this kind of example, we have a theory about the kitchen, the sink, the pipes, the connections, water, fluids, etc. (we called it T), and a geometrical theory about lines, extreme joints between lines, inclusion relations between polygons, geometrical axioms, etc. (we called it G). The representation relation (we called it R) establishes a partial morphism between T and G which can be used to make inferences in terms of one another. The notion of theory, here, is taken in the logical sense: a theory is just a collection of terms of the representational language. We impose very few restrictions on the kind of knowledge that a theory can have, and we allow the inclusion of both factual and definitional information within a theory. General concepts about the application domain are expressed as terms of these theories. It is worth pointing out that there are alternative approaches to capture the relation between graphics and its interpretation. In the APT system, for instance, Mackinlay defined an encodes relation that maps graphical objects into their corresponding semantic interpretations (Mackinlay, 1987). Similar ideas have been developed in the context of the WlP system, in which a coref and an encodes relations are defined (Andr6 & Rist, 1994). However, whereas the arguments of the representation relation in GRAFLOG are wellformed expressions of the corresponding languages explicitly defined by their syntax and semantics on the lines of standard model theory (Dowty et al., 1981), the arguments of Mackinlay's encodes relation are unconstrained set theoretical expressions. The arguments of WIP's corref and encodes relations, in turn, are variables referring to objects whose properties are implicitly defined given that no explicit distinction is made
151
between graphical and conceptual levels of representation. 3. REFERENCE AND PROBLEM-SOLVING There are other motivations for establishing this morphism between interpretations. One has to do with questions of reference: we sometimes point at a graphical object asking for its conceptual interpretation, or sometimes we give a linguistic description that can only be interpreted in terms of a graphical context. Furthermore, in multimodal dialogues both kinds of references can be given in a mixed fashion along the interactive session. Only if we are able to infer the interpretation in which the pointing act is given can we identify the intended referent of the expression. This consideration is important for identifying the referent of deictic expressions: pointing is an inherently ambiguous act. When we point to a graphical representation it is not possible to identify whether we intend to select a whole object, or the part of the object at which we pick, the whole drawing on the screen, or even the computer screen on which the drawing is displayed! Graphical and conceptual constraint can filter out a large number of potential geometrical referents if the interpretation is taken into account. These issues have been dealt with to a certain extent elsewhere (Mass6, 1994; Pineda et al., 1994). Human users perform interactive actions, either linguistic or graphical, with the aim of achieving specific goals. Goals can be stated implicitly through the interactive tasks available at the interface, or explicitly with the help of a natural language dialogue. Goals can be represented implicitly in the system's data structures or can be explicitly represented through an expressive enough representational language. Even a simple graphical editing operation has an underlying goal. When the user selects a rectangle from a graphical menu with the intention of representing a kitchen and places it on the screen, for instance, he has the intention that this particular rectangle, with certain geometrical properties, stands for the kitchen. However, a human user can have more specific goals that must be represented explicitly by the system, as when he states that a certain line, representing a pipe, should be horizontal in all drawing states. If the orientation of the line is changed, either by a direct reference to the object, or through a change made upon an object related to it (i.e. moving a sink to which the pipe is connected), the horizontal condition should be preserved. If the system is intelligent enough to interpret the user's intentions, it has to initiate a problem-solving process to satisfy the condition. In this kind of situation numerical constraint satisfaction methods have been widely employed (Boming, 1981, 1986; Leler, 1988; Alpert, 1993). However, it is also possible to solve such problems in terms of symbolic inference methods. In particular, numerical problem-solving methods that are applied to satisfy geometrical constraints can be avoided
152
L. A. Pineda / Expert Systems with Applications 14 (1998) 149-157
pointing out that, in our framework, referring to 3-D objects is just a matter of adding a third argument in the arity of the constructors of geometrical objects such as dots, lines and polygons. Consider that the abstract representational structures in which graphics are represented and reasoned about are completely independent of the rendering mechanisms of the system. It is important to consider that the user's goals can be stated by making reference to the conceptual or to the geometrical interpretation of drawings, and the changes made by the system in one domain can affect the consistency of the theory in the other domain. For these reasons, the satisfaction of constraints needs to be thought of as a process acting in both the graphical and linguistic representation in an interwoven way. A problem can be solved only when theories are consistent internally, and between each other. The relation between theories is evaluated through the representation relation.
in many situations with the help of proper reference machinery (Pineda, 1992a), and other complex problems involving the synthesis of complex structures can be solved through drafting rules acting on graphical symbols (Pineda, 1992b, 1993). Furthermore, the higher the level of graphical structure of the objects involved in a problem-solving task, the simpler the symbolic inference required to solve the problem (Massr, 1994; Morales, 1994; Morales & Pineda, 1995). Further examples of the effective use of a graphical reference and drafting tasks in geometrical problem solving have been explored in analogical reasoning tasks. In this line of work, a 3-D graphical language for the representation of polyhedra has been defined. The language has been applied to the problem of extracting a solid model of a convex polyhedron from its orthogonal views by means of drafting rules of the kind shown in the dialogue of Figs 1 and 2 (Garza, 1995; Garza & Pineda, 1995). It is worth 2.
1.
U> I b i s i s i k i i d l ~ l l
U>
G> o i
G> ok
3.
Ildiisalil
7
6
U> w l l l s l h i s ?
U> I b i l i l l p t l ~
G> a sink
G> ok
8
12
5.
U> wodd~X):.- dnk(X ),pip~y).cr~e~(~d(Xoy) G> O K U> woddq(X) G> N O U> coreccsed(X.Z): .conncct~ X . y ) ~ ( y ,Z) U> woddnll(X) G> NO 10
9
U> wrA'ldng(X) G> working(Ibis)
U> this is t pipe G> OK I1
4.
U> Ibil is horizonlll (3> OK
13
LN Ib~ ~ vell~.al G> OK
14
',~
1 _PI
l
i
t G> The vertical cormrilnl is no[ satisfied A vl~iCll c o m ~ c t i r m line, pililing Ibmugh PI is drawn. A Ixfizmitil o~Jiinl clion I l l , pLiiing lllrmgh P2, is drawn. |6
--
~--
PI
17
Co w~mir~:l~liM
the c ~ o n
18
U> woddng(X) G> N O
lines.
',
G> The hmtzoflod colr~Jmdnl is m l lalislied, A hl~riznmalcomlmclion lin¢, pitlling Iblroulltl I)3. is & a i m . A ",~rlical c ~ l , ' u c l i o n line, plsiirl I i l ~ l h PI, ii (I~iwN, 19
G> The dcA Pl L~n~ived m the inteR~ti~1 between Ibe c ( ~ s ~ c t i ~ 1 liwts.
2O
U> working(X) G> working(Ibis)
FIGURE 1. Multimodal dialogue supported by current version of GRAFLOG.
L. A. Pineda / Expert Systems with Applications 14 (1998) 149-157 1.
U> this is a kitchen
G> ok
2,
3.
U> this is a sink G> ok
U> what is this? G> a sink
153
4.
D'J
U> what is this?
G>
a c o ~ ¢ t of a room
U> this is the waste-pipe G> ok
9.
U> if a sink is c o n n o t e d
it is working
U> is this working?
G>
I
U> this must be horizontal O> ok
no
U> this is a pipe G> ok
10.
U> is the sink working? G> yes
G> ok 13.
12.
II.
-
U> G> iris not working!
U> makeit work G> ok
-
~
-
U> G>
14.
15.
"
U> G> it is not hodzantal!
U> make it! G> ok.
FIGURE 2. Multimodal dialogue to be supported by the next version of GRAFLOG.
In our framework, the use of graphical languages for maintaining multimodal consistency in interactive contexts within a graphical theory has been investigated in an intelligent tutoring system for basic calculus (Morales, 1994; Morales & Pineda, 1995). 4. R E F E R R I N G EXPRESSIONS
In natural language dialogues, reference is focused on salient individuals in the discourse context. The focus is the object that is being talked about. Graphical dialogues have similar salient objects, and graphical reference is often relative to a given focus. Graphical focus, as in natural language, changes along the dialogue and the discourse transactions. Focus is important not only to help in fixing and resolving reference, as in anaphoric relations, but also in inducing user's intentions which consist in satisfying the explicit constraints of the design task in a manner which is consistent with the overt multimodal input. Although closely related, the structure holding the focus is independent of the structure maintaining the goals of the discourse; following Grosz, we call these the attentional and the intentional structure, respectively (Grosz & Sidner, 1986). Changes made by users in interactive transactions which are satisfied through constraint satisfaction are very sensitive to the current focus. Geometrical constraint satisfaction problems can have many solutions, but only a few of them can be considered as intended by the user. Establishing the focus of attention of the user, and inferring solutions
relative to such an object, can help to establish solutions that are not only sound, but also relevant (Pineda, 1992b, 1993). One aim of the research is to develop a theory for the generation of referring expressions both in the linguistic and graphical domain. One assumption in this regard is that graphical, linguistic and heterogeneous descriptions depend on the user's or system's goals at the interaction state, and of the content of the triplet (T,G,R) at a given state of the interaction. The synthesis of descriptions is useful in an interactive task both on input and output. If the user defines a graphical object through the interactive session, it is often convenient to represent that object through an expression which denotes the object at any arbitrary state of the interaction and not only in the definition state. Although graphical objects have extensional representations in most systems, intentional representation for graphics can be defined with large benefits in graphical problem-solving and reasoning. The definition of a graphical symbol through an interactive input event (simple or composite) can be thought of as a definition that fixes a referent: a name for a symbol which will refer rigidly to a graphical object is associated with a number of geometrical properties that fix the reference of the symbol. If one geometrical property of the symbol is later modified (i.e. as a consequence of a user's or a system's event), such as the position of a dot or the length of a line, the identity of the symbol remains the same. Intentionality in graphical representations has been discussed elsewhere (Pineda, 1989, 1992a, 1993;
154
L. A. Pineda / Expert Systems with Applications 14 (1998) 149-157
Pineda & Lee, 1992) and the current prototype GRAFLOG has an algorithm for inducing intentional expressions to refer to graphical objects that are placed in a graphical context during the interactive session. For output tasks, natural language referring expressions with graphical deictic components can also be produced in terms of the content of (T,G,R), a focus of attention and a user's goal. Expressions of this kind are the key to the definition of coordinated graphical and linguistic explanation systems. Some initial results in this line of work have been reported elsewhere (Pineda, 1992b; Santana & Pineda, 1995; Santana et al., 1995). The theory has also been used for implementing in our framework a version of the APT system (Mackinlay, 1987), for the automatic generation of graphical presentations of relational information (Bol~in, 1995). 5. INTERACTIVE DIALOGUE AND DISCOURSE THEORY The scheme that we are developing for the generation and interpretation of linguistic and graphical terms, and also of graphical events, depends on the current states of the linguistic theory T, the graphical theory G, and the representation relation R. We assume that algorithms for generating referring expressions take these three structures as their input. However, in the course of a graphical interactive session these three objects can vary. A natural language text string taken together with a sequence of graphical acts, a multimodal event, can be considered a kind of complex sentence; similarly, a flow of multimodal events occurring in the course of an interactive session can be thought of as a kind of discourse. Although these complex events are processed in terms of the current interpretation, multimodal events can change the triplet (T,G,R) constituting the representation. It is possible, for instance, to change the linguistic theory while preserving the current graphical representation, and also the other way around. Consider, for instance, a situation in which there is a switch of the context from the content of an application window to the interface objects of the interface screen. All graphical objects visible at the interface would remain the same, but the natural language discourse would switch from a theory about kitchens and sinks, if we are talking about an architectural domain, to a theory about windows, menus, dialogue boxes and scroll bars. This would be a switch in the interpretation theory. A similar switch of theory could be given within the content window. In the architectural setting, we could switch along the interaction from a theory about the form and materials of design objects to a theory about their function. In this situation the underlying graphical representation could be unaltered, but its interpretation can have a radical change. The mechanisms to implement interpretation switch are, however, a matter for further research. Changes in interpretation and graphical theories
during the interactive session depend on the human user's goals, changes in topic and focus of attention, and on the structure of the natural language and graphical events input through the interface. The aim of the theory is to keep an explicit representation of these discourse objects. A theory of this kind would allow the interpretation and generation of referring expressions with anaphoric and deictic relations (Pineda & Garza, 1997). Currently, we are exploring the possibility of extending Kamp's discourse theory (Kamp, 1981; Kamp & Reyle, 1993) with explicit rules to handle spatial deictic references. The aim is to combine in a systematic manner discourse referents introduced linguistically with referents provided by a pointing mechanism. In the same way that Kamp's discourse rules, which are related to the grammatical categories of the natural language grammar, incorporate discourse referents and conditions to the discourse representation structure (DRS), we are investigating whether it is possible to define discourse rules acting on expressions produced by a grammar characterizing meaningful graphical objects and events. These rules would incorporate representations of graphical objects and relations and constraints as discourse referents and graphical conditions into the DRSs. The end result would be a theory of intelligent graphical and linguistic interaction that could be applied to diverse application domains. Some initial results in this line have been presented by Pineda and Garza (1997). Additionally, we intend to identify an ordered list of discourse referents in the DRS with the attentional structure in an explicit way. This structure would be used in the generation process of graphical and linguistic referring expressions. 6. A GRAPHICAL AND LINGUISTIC DIALOGUE Up to this point, the elements of a theory of linguistic and graphical representation and interpretation in the context of interactive graphics have been sketched. In the course of our research programme some of these concepts have been clarified to a certain extent whereas others are still very vaguely defined. In Fig. 1, we exemplify the kind of multimodal dialogue that the current version of GRAFLOG is able to handle. We include a comment attached to every interactive event with a reference to the theoretical notion involved. (1) A graphical object (a rectangle) is introduced through a natural language statement and a graphical command. A representation of both the graphical object and its interpretation is added in the corresponding representational layer. (2) A second graphical object with its interpretation is defined. (3) A natural language question with a deictic gesture is asked. The graphical reference is ambiguous in the geometrical and also in the linguistic domain. The
L. A. Pineda / Expert Systems with Applications 14 (1998) 149-157
ambiguity is solved in terms of the geometrical information: the sink is the most specific object. (4) A new object is defined. (5) The user imposes a functional interpretation upon the graphical configuration: if a sink is working it must be connected to a pipe. The user establishes a transitive condition upon the connected relation. The basic geometrical interpretation of connected is given in advance, as was shown for the preposition in. This dialogue is typed in Prolog, as no natural language parser for this kind of expression has been implemented. (6) The user defines a construction line. (7) The user defines a pipe. This object is defined from the intersection between the other pipe and the construction line, and the intersection between the sink and the construction line. This is an intentional definition: the definition of the pipe is not independent of the context and if the context is changed, its geometrical properties will change too. (8) The user asks for the functionality of the sink. In this case the working constraint, which in turn depends on a geometrical constraint, is satisfied. The system highlights the sink. (9) The user imposes a geometrical constraint upon the construction line. No problem-solving action is initiated as the system has no information of how the constraint should be satisfied. (10) The user imposes a geometrical constraint upon the external pipe. (11) The user performs a graphical act: he drags up the right extreme dot of the construction line. The position and length of the pipe connecting the sink and the external pipe are updated automatically. The values of the new line coordinates are computed by reference. The system enters a problem-solving mode, as the vertical and horizontal constraints are not satisfied. How the constraint will be satisfied depends on which graphical objects were moved (the dot) and in which direction. (12) The system finds references for making the pipe vertical, using a predefined drafting rule. Construction lines are drawn by the rules simulating standard technical drafting practices. The natural language text below the graphics is produced with the help of natural language templates attached to the drafting rules. (The actual texts in Stages 12-15 of Fig. 1 are produced by a Spanish version of GRAFLOG). (13) The system makes the pipe vertical, moving the top of the pipe to the intersection between the construction lines. (14) The system makes the pipe connected to the sink horizontal with the help of a second drafting rule. In this step, the system draws two construction lines. (15) The system moves the intersection between the two
(16)
(17)
(18)
(19)
(20)
155
pipes to the intersection between the two reference construction lines. Once the problem-solving mode is terminated, the auxiliary construction lines are erased, reaching Stage 16. The user moves up the horizontal construction line. The vertical pipe is updated by reference. The pipe connecting the sink, which has an intentional definition, cannot be drawn as there is no intersection between the reference construction line and the sink. In Stage 17 of Fig. 1 this pipe has no denotation. This condition is notified to the user. The user asks whether the sink is working. This conceptual condition cannot be satisfied as the underlying graphical context is not consistent. The user moves the sink up. In the resulting state the connecting pipe has a proper denotation and can be drawn on the screen. The user asks whether the sink is working. Here, both conceptual and geometrical conditions involved can be satisfied.
As can be seen, the most developed facilities of the current version of GEAFLOG allow us to impose a meaning upon graphical symbols and relations through the interactive session, to ask queries whose answers require the system to perform heterogeneous inferences, and to satisfy geometrical constraints through symbolic inference. It also has a graphical and natural language explanation facility for helping the user to understand its problem-solving process. However, GRAFLOG is limited to handling multimodal events that are local to interactive transactions, as there is not an explicit theory of discourse. It is also limited in its natural language capabilities, and sentences with relative clauses and modal modifiers have to be typed in directly in Prolog notation. Although the current version can realize that a conceptual constraint is not satisfied, the facility for calling up a problem-solving method for satisfying such a constraint at this level of abstraction has yet to be implemented. To handle the limitations of GRAFLOG we are developing a new version with an explicit model of discourse and a more robust natural language facility. In Fig. 2 we show the kind of dialogue we are studying currently, which we aim to implement in our next demonstrator system. As before, we include a comment related to each stage of the dialogue.
(1) As in Stage 1 of Fig. 1. (2) As in Stage 2 of Fig. 1. (3) A natural language question with a deictic gesture is asked. The graphical reference is ambiguous in the geometrical and also in the linguistic domain. The ambiguity is solved in terms of the interactive discourse: the sink was the individual last mentioned. (Contrast this solution with that shown in
156
L. A. Pineda / Expert Systems with Applications 14 (1998) 149-157
Stage 3 of Fig. 1.) (4) A simple graphical object (the top-right corner of the square) has a composite natural language description. (5) A new graphical object with its interpretation is defined. (6) A natural language concept that depends on the graphical context is defined. (7) A question with a deictic component is asked. The answer is based on facts stated in both linguistic and graphical domains: a graphical and linguistic inference is made. The notion of 'connected' has an interpretation in the graphical domain defined in advance as in the current version. (8) A construction line is defined in the geometrical domain. It has no interpretation in the linguistic domain, and a geometrical constraint is associated with it. (9) A graphical object that depends on a geometrical context is defined. As above, an intentional definition is produced by the system to refer to the line: it is the line whose origin stands at the intersection of the line representing the waste-pipe with the construction line and the intersection of the left edge of the square representing the sink with the same construction line. This line has, in addition, a linguistic interpretation as a pipe. (10) A question about a conceptual constraint is asked. The system answers in terms of a graphical and linguistic inference. (11) A graphical object is dragged with consequences in both geometrical and linguistic interpretations: the line with an intentional definition has no denotation in the new graphical state and it cannot be displayed on the screen, and a conceptual (functional) constraint is violated (the sink is not connected). (12) The user types a linguistic command in which an anaphoric reference is made. The system infers and executes a plan for satisfying the conceptual constraint which in turns involves graphical transformations. In the resulting state graphical objects with intentional definition have a proper denotation. (13) A graphical action upon an object with an intentional definition is produced by the user. Translation of the line with an intentional definition affects all objects upon which it is defined. No problem-solving process for satisfying the involved constraints is required, as the new properties of the objects involved are computed by reference and taking into account some default assumptions about this kind of change. (14) A graphical object that has a geometrical constraint imposed upon it is modified: the constraint is no longer satisfied and the system warns the user. (15) The user enters a natural language expression with a complex anaphoric reference. The system solves
the problem applying a geometrical reasoning process, until a state as in Stage 12 of Fig. 2 is reached. The solution involves reaching an intermediate state in which a conceptual constraint is violated (the sink is not working), and also in which the line with an intentional defnition has no reference. In the solution, the direction of the change in the previous transaction (Stage 14) is considered. It is worth mentioning that additional work in relation to the generation of graphical and linguistic explanations, as in Stages 12-15 of Fig. 1, is also being considered (Santana et al., 1995). 7. CONCLUSION In this paper we have illustrated the kind of dialogue that can be handled by the current version of GRAFLOG, and we have presented how we plan to enhance such functionality in the next version. We have also discussed some of the motivations and the theoretical mechanisms underlying the architecture of the system. As a final remark, we would like to emphasize that the kind of dialogue supported by GRAFLOG i s important for multimodal presentation systems. The main task is to induce a design intention from the user and to produce a presentation satisfying such an intention. It should be noted that in the example dialogues shown above, all graphical scenarios with their interpretation are introduced incrementally during the interactive session. Although we could have a predefined graphical scenario upon which the design task could be contextualized, the aim in our research is to study the geometrical layout design task itself. We believe a proper understanding of such a task is very important for creating interesting intelligent multimodal systems. REFERENCES Alpert, S. R. (1993). Graceful interaction with graphical constraints. 1EEE Computer Graphics and Applications, 13(2), 82-91. Andre, E., & Rist, T. (1994). Referring to world objects with text and pictures. In Proceedings of the International Conference on Computational Linguistics, COLING, Kyoto, Japan, Association of Computational Linguistics, pp. 530-534. Bol~in, J. C. (1995). Generaci6n din/tmica de presentaciones gr~ificas a partir de informaci6n relacional. M.Sc. thesis, Centro Nacional de Investigaci6n y Desarrollo Tecnol6gico, CENIDET/SEP, Cuernavaca, Mexico. Borning, A. (1981). The programming language aspects of ThingLab, a constraint-oriented simulation laboratory. ACM Transactions on Programming Languages and Systems, 33(2), 353--387. Borning, A. (1986). Graphically defining new building blocks in ThingLab. Human Computer Interaction, 2, 269-295. Dowry, D. R., Wall, R. E., & Peters, S. (1981). Studies in Linguistics and Philosophy, 1Iol. 11. Introduction to Montague Semantics. Dordrecht: D. Reidel. Garza, E. G. (1995). Sintesis de poliedros a partir de sus vistas ortogonales: un caso de estudio acerca del razonamiento gr~ifico. M.Sc. thesis, Instituto Tecnol6gico y de Estudios Superiores de
L. A. Pineda / Expert Systems with Applications 14 (1998) 149--157 Monterrey, Campus Morelos, Cuemavaca, Mexico. Garza, G., & Pineda, L. A. (1995). A qualitative procedure for the production of solid models of polyhedra from their orthogonal views: a case of study in reasoning with graphical representations. In L. E. Sucar & E. Morales (Eds), XII Reuni6n Nacional de lnteligencia Artificial, Cuernavaca, Mexico, Sociedad Mexicana de Inteligencia Artificial, Noriega Editores, pp. 9-18. Goguen, J. A., Tatcher, J. W., & Wagner, E. G. (1978). An initial algebra approach to the specification, correctness and implementation of abstract data types. In R. T. Yeh (Ed.) Current Trends in Programming Methodology (pp. 80-149). Englewood Cliffs, NJ: Prentice--Hall. Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3), 175--204. Kamp, H. (1981). A theory of truth and semantic representation. In J. Groenendijk, T. Janssen & M. Stokhof (Eds) Mathematical Centre Tracts, Number 135, Formal Methods in the Study of Language (pp. 277-322). Amsterdam. Kamp, H., & Reyle, U. (1993). Studies in Linguistics and Philosophy, Vol. 42. From Discourse to Logic. Introduction to Model Theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Dordrecht: Kluwer Academic. Klein, E., & Pineda, L. A. (1990). Semantics and graphical information. In D. Diaper, D. Gilmore, G. Cockton & B. Shackel (Eds) HumanComputer Interaction. Interact'90, Cambridge, UK. Amsterdam: IFIP, North-Holland. Leler, W. (1988). Constraint Programming Languages: their Specification and Generation. Reading, MA: Addison Wesley. Mackinlay, J. D. (1987). Automatic design of graphical presentations. Ph.D thesis, Stanford University, CA. Mass6, J. A. (1994). Satisfacci6n de restricciones pot referencia simb61ica en dibujos geom6tricos. B.Sc. thesis, Universidad Nacional Aut6noma de M6xico, Mexico City. Morales, R. (1994). Pizarrones interactivos multimodales para la ensefianza de conceptos matem~iticos. M.Sc. thesis, Instituto Tecnol6gico y de Estudios Superiores de Monterrey, Campus Morelos, Cuemavaca, Mexico. Morales, R., & Pineda, L. A. (1995). Pizarrones interactivos multimodales para la ensefianza de conceptos matem~iticos. In L. E. Sucar & E. Morales (Eds) XI1 Reuni6n Nacional de Inteligencia Artificial,
157
Cuemavaca, Mexico. Sociedad Mexicana de Inteligencia Artificial, Noriega Editores, pp. 19-26. Pineda, L. A. (1989). GRAFLOG: a theory of semantics for graphics with applications to human-computer interaction and CAD systems. Ph.D. thesis, University of Edinburgh, UK. Pineda, L. A. (1992). Reference, synthesis and constraint satisfaction. Computer Graphics Forum, 11(3), 333-344. Pineda, L. A. (1992). Sketching, graphical anaphora and constraint satisfaction. In Proceedings of the Samos Workshop on Task Based Explanation, Samos, Greece, 28 June-I July. Research Laboratory of Samos, University of the Aegean. Pineda, L. A. (1993). On computational models of drafting and design. Design Studies, 14(2), 124-156. Pineda, L. A., & Garza, E. G. (1997) A model for multimodal reference resolution. In Proceedings of the Workshop on Referring Phenomena in a Multimedia Context and their Computational Treatment, ACL-SIGMED1A, Somerset, NJ, July, Association for Computational Linguistics, pp. 99-117. Pineda, L. A., & Lee, J. R. (1992). Logical representations in drafting and CAD systems. Technical report, EdCADD and Centre for Cognitive Sciences, University of Edinburgh, UK. Pineda, L. A., Santana, J. S., & Mass6, A. (1994). Satisfacci6n de restricciones geom6tricas:/,problema num6rico o simb61ico? In V. Estivill-Castro (Ed.) XI Reuni6n Nacional de lnteligencia Artificial, Universidad de Guadalajara, Guadalajara, Sociedad Mexicana de Inteligencia Artificial, pp. 105-123. Santana, S., & Pineda, L. A. (1995). Producing coordinated natural language and graphical explanations in the context of a geometric problem-solving task. In J. Lee (Ed.) Proceedings of the First International Workshop on Intelligence and Multimodality in Multimedia Interfaces: Research and Applications, IMMI-1, Edinburgh, UK, July, Human Communication Research Centre. Santana, S., Pineda, L. A., & Vadera, S. (1995). Textual and graphical explanations for geometric problem-solving systems. In L. E. Sucar & E. Morales (Eds), XII Reuni6n Nacional de lnteligencia Artificial, Cuernavaca, Mexico. Sociedad Mexicana de Inteligencia Artificial, Noriega Editores, pp. 106-112. Sucar, L. E., & Morales, E. (Eds) (1995). Xll Reunion Nacional de lnteligencia Artificial, Cuernavaca, Mexico. Sociedad Mexicana de lnteligencia Artificial, Noriega Editores.